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FOREWORD 


These Proceedings include the contributed papers presented at the Fourth 
Baiona Workslioj) on Intelligent Methods for Signal Processing and Commu¬ 
nications, held in Baiona, Spain, .June 24—26, 1996. 

Six technical sessions were organized around the following topics: 

I Neural Networks and Non-linear Modeling. 

II Image Processing. 

III Array Processing and Channel Identification. 

IV Adaptive Systems in Communications. 

V Emergent Techniques. 

VI Hardware and Software Implementation. 

We hope that the arrangements we made have benefited from our previous 
experience in organizing such kind of events; as in past occasions, the good 
atmosphere at the Conference was again facilitated by the interest of the 
contributions, the active involvement of all the participants, and the nice 
place where we met. Thank to all the authors, since they provided the 
essential material, creating the conditions for repeating the event, getting 
even higher quality and participation. 



One of the main objectives of this workshop was to serve as a starting point 
from which further applied research initiatives can appear. Therefore, the in¬ 
volvement of the different social and economic organizations is of paramount 
importance throughout the process. We very much appreciate the undei- 
standing and support of the followitig Spanish institutions: 

• CICYT: Accion Especial TIC95-1340-E 

• Xunta de Galicia 

• Universidad de Vigo 

• FEUGA 

• lEEE-Rama Espahola 

We also wish to thank the United States Air Force European Office of 
Aerospace Research and Development and the Office of Naval Reseaich, Eu¬ 
rope, for their contribution to the success of the conference. 

We finally express again our invitation to researchers both from the Univer¬ 
sities and Research Centers and from the industrial arena to participate in 
these events since we truly believe that they will appreciate the inteiest of be¬ 
ing active at these kind of Workshops, in which the state-of-the-art is levised, 
the latest applications discussed, round tables about hot topics maintained, 
international cooperation fostered, and interesting contacts made. 


Domingo Docampo 
Anibal R. Figueiras 
Fernando Perez 
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ABSTRACT 

In this paper we present a new speech coder whose main 
characteristic is the use of a nonlinear predictor consist¬ 
ing in a cascade of a RBF (Radial Basis Function) net¬ 
work and a linear filter. The most significant disadvan¬ 
tage of this coder in comparison to CELP algorithms 
is the increase in computational cost. Here, we study 
the possibility of diminishing noticeably this drawback 
by reducing the nonlinear analysis frame length from 
which the predictor is estimated. Some simulations re¬ 
veal how the computational cost can be reduced by a 
factor of three without compromising performance. 

1. INTRODUCTION 

Neural Networks (NN) constitute a well established 
technique for signal processing due to their outstand¬ 
ing properties: nonlinearity, learning from examples, 
generalization ability, parallelism, fault tolerance, easy 
VLSI implementability, etc. The use of NN usually in¬ 
volves two phases: learning and retrieving. The learn¬ 
ing phase is computationally expensive, but in most 
cases is carried out in an off-line way; while the rer 
trieving phase can be performed efficiently in real time 
with an appropriate hardware implementation. 

Speech signals exhibit both nonlinearity and non¬ 
stationary. NN are inherently nonlinear; however, their 
adaptation ability to the time varying characteristics of 
the input in an on-line manner is strongly limited by 
the computational burden associated to their learning. 
This problem has already been evidentiated in a recent 
work [1]. 

Our work focuses on the application of adaptive 
neural network-based prediction to speech coding. In 
the previously cited paper, the application to speech 
coding was considered, and a new modular and recur¬ 
rent network efficiently operating on a sample to sam¬ 
ple basis was presented. Such a network predictor was 


reported to work properly in waveform coding algo¬ 
rithms at high bit rates [2]. Our investigation centers 
on coding schemes working at medium bit rates, which 
operate on a block to block basis (this is also valid for 
low bit rates-operation schemes). In our previous works 
[3] [4] we proposed a cascade of a Radial Basis Func¬ 
tions (RBF) network and a linear filter as a nonlinear 
predictor for CELP-type coders; nevertheless, in that 
scheme quality was improved compromising computa¬ 
tional cost, which should be reduced to get practical 
coders. Other related works [5] [6] do not address the 
problem of computational complexity. 

In this paper we present a new version of our previ¬ 
ous work directed to reduce its complexity by showing 
the performance - complexity balance and proposing a 
way to improve it. As it will be shown, the compu¬ 
tational effort of training in this block-processing ap¬ 
proach is proportional to the block (hereafter, frame) 
length. As a result, this frame length, which in linear 
schemes has only proportional impact in the correlation 
coefficient calculation and in the buffering delay (pro¬ 
vided that it is small enough to avoid averaged results 
and large enough to include a representative number 
of samples), plays an important role in computational 
efficiency of nonlinear approaches. 

The paper is organized as follows: in Section 2, a 
reduced analysis frame length is proposed to improve 
the efficiency of network-based block-adaptive predic¬ 
tors for speech coding, and its benefits and limitations 
are discussed. In Section 3 the reasons for using a cas¬ 
cade of RBF Network and a linear filter as predictor 
are explained. An overview of a CELP-type coder in¬ 
cluding this type of predictors is given in Section 4; 
also here, some simulations are presented which reveal 
that the proposed approach significantly reduces the 
computational complexity without compromising per¬ 
formance. Finally, in Section 5 conclusions about this 
investigation are given and ongoing work is outlined. 
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2. EFFICIENCY CONSIDERATIONS FOR 
NETWORK-BASED BLOCK-ADAPTIVE 
PREDICTORS 

The cost function used to train a predictive network 
usually contains a principal term of the form 

^ (Xn+l-/(Xn))^ (1) 

n=0 

where x„ is the vector of previous samples and x„+i is 
the sample to be predicted, /(•) is the predictive map- 
oing, and N is the frame length. The dimension of 
the vector x„ is the predictor order, i.e., the number o 
previous samples used for prediction. As equation 1) 
shows, the squared difference between the actual value 
and its prediction is evaluated for every sample of the 
frame. Usually, these cost functions are minimized by 
a gradient descent algorithm which is iterated through 
the training set a number of times (epochs). The gra¬ 
dient expressions will involve the term corresponding 
to each sample; all these terms should be evaluated in 
a sequential way. Therefore, the training cost will be 

proportional to TV. ^ 

In a parallel hardware implementation of the algo¬ 
rithm, the frame size, if it is large, renders the most 
important contribution to the final computational cost 
in the learning phase, since each algorithm’s iteration 
through the frame cannot be processed in parallel. 

In this work we study the possibility of reducing the 
frame length from which the predictor is trained, while 
preserving good generalization properties and eventu¬ 
ally, good performance. However, there can be foun 
some constraints when reducing the frame length; a 
very short frame means a very reduced training set, 
and very likely not large enough to reveal the regular 
ities of the signal; from other perspective, a reduced 
training set leads to poor generalization. 

Another obvious way of reducing the computational 
cost of the learning phase is optimizing as much as pos¬ 
sible the number of epochs used. The reason for this is 
that computational resources should not be wasted in 
improving a convergence degree from which further re¬ 
finements have not significant effects on the prediction 
(coding) performance. 

3 regularized rbf-based hybrid 

PREDICTORS 

The RBF network is a single-layer network which com¬ 
putes the formula 

fix) = ^ c,G(l|x - till) (2) 


where {G(-)} are RBF, {ti} are the RBF centers, {c,} 
are the weights of the linear combination, and M is the 
number of RBF used. We use Gaussian RBF 

G(®) = exp . (3) 

being <r its variance or width. 

We choose the RBF network for the nonlinear pre¬ 
diction task for two main reasons: 

• the computational cost of its training is very small 
compared to other types of networks. 

« it yields a regularized solution to the prediction 
problem. This means that we seek an smooth 
solution, which offers good predictions in the re¬ 
gions where training data is not available. A com¬ 
promise exits between smoothness and closeness 
to the data, which is controlled through a regular¬ 
ization parameter. Specifically, the cost function 
to minimize takes the form 

/f[/]=2‘(x„+i-/(x„))^-l-Al|P/|l=* (4) 

n=0 

where ||P/|P is a functional of the solution that 
introduces the smoothness constraint (in general, 
this functional embeds the “a priori” knowledge 
of /), and A is the regularization parameter. 

We train the RBF network in two stages. First, ini¬ 
tial values for the centers and the variance are obtained 
by means of a fast procedure, and the output weights 
are determined via pseudoinverse [7]. Second, this so¬ 
lution is further refined by means of some epochs of a 
gradient descent method. 

The proposed predictor is the de-coupled serial con¬ 
figuration shown in Figure 1. A comprehensive discus¬ 
sion about this configuration has been previously re¬ 
ported [4]. Here, we will briefly summarize the two 
main reasons to choose this configuration. 

• to complement the linear prediction capabilities 
with a nonlinear contribution, instead of remov¬ 
ing the linear basis and building a new global 
nonlinear solution. 

• it can be easily applied to analysis-by-synthesis 
coders in a suboptimal way (by removing the non¬ 
linear part from the excitation selection proce¬ 
dure), still providing good results. 

Some experiments were performed to design the op¬ 
timum value for the regularization parameter revealing 
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that it is convenient to switch dynamically the predic¬ 
tor between two possible states: with or without net¬ 
work [network on - network off) [4]. In the first state, 
the predictor is shown in Figure 1 and A takes a 
value around 10 (our speech signals amplitudes range 
is [-32768,32767]). For the second state, corresponding 
to a high value of A, the network is disabled, and the 
predictor remains linear. 


RBF prediction Final prediction 

Speech error error 



(a) 


Coded 



(b) 

Figure 1: The predictor: de-coupled serial configura¬ 
tion of RBF and linear predictors, (a) Analysis system, 
(b) Synthesis system. 


4. DESIGN OF AN EFFICIENT CENP 
CODER 

The suggested optimization of the hybrid predictor adap¬ 
tion (reduction of the analysis frame length) is inves¬ 
tigated for a low delay CELP-type coder, which we 
call CENP (Code-Excited Nonlinear Predictive) coder. 
Firstly, we present a brief description of the CENP 
coder; afterwards, we show the experiments carried 
out to improve the performance ~ efficiency balance 
by means of reducing the frame length; finally we draw 
our conclusions. 


4.1. OVERVIEW OF THE PRELIMINARY 
CENP CODER 

Figure 2 shows the block diagram of the CENP, whose 
main characteristics are next described. The long term 
predictor is carried out by means of an adaptive code¬ 
book. The stochastic codebook contains 1024 vectors. 
Both predictor and excitation adaptations are performed 
once every 2.5 ms. The prediction adaption is back¬ 
ward; thus the predictor parameters have not to be 
transmitted since they are updated from coded speech, 
also available in the decoder. As a result, there is no 
increase in the bit rate due to the inclusion of the non¬ 
linear prediction; on the other hand, quality improve¬ 
ments are realized only at the cost of the computational 
effort required for network training. The final bit rate 
is in the range 8-16 kb/s, depending on the quantiza¬ 
tion scheme and the predictor and excitation updating 
rates. 

Stochastic 



Figure 2: Block diagram of the CENP coder. 

The excitation is selected as in typical CELP coders; 
however, in our case, the objective of the search proce¬ 
dure is that the output of the linear filter matches the 
nonlinear prediction residual (see Figure 1(b)). The 
perceptual weighting of the quantization error has not 
been included yet, leaving this issue for further work. 

The decision to include or not the network is based 
on the results of a simplified synthesis. This synthesis 
is carried out in both cases (network-on and network- 
off) and the one which produces a coded speech closer 
to the original will be the chosen one. The simplified 
procedure is as follows: for the network-off case, the 
adaptive contribution (delay and gain) is determined 
considering all possible delays; for the network-on case, 
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the delay is sought only around the delay obtained in 
network-ofT case. The stochastic contribution is cal¬ 
culated for both cases using only a 12% of the whole 
stochastic codebook. For the final synthesis, the search 
in the stochastic codebook is completed for the winner 
option. Therefore, the decision is performed at the 
expense of an increase (around the 12 %) in computa¬ 
tional cost of the excitation selection. Obviously, this 
decision procedure is suboptimum, and there is room 
for investigating new methods. 

4.2. INCREASING THE EFFICIENCY 

Initially, the procedure suggested to optimize the learn¬ 
ing consisted in starting each frame training with the fi¬ 
nal network parameters of the previous frame [4]. This 
approach has an important advantage: when the pre¬ 
dictor is frequently updated, the low variation, frame 
to frame, of the input signal is easily tracked, since 
the training starts nearer its objective. However, we 
have also found a significant disadvantage: sometimes, 
when there are rapid changes of the characteristics of 
the signal from one frame to the next, the training is 
trapped in a local minimum and it takes several frames 
to escape from it. 

To solve this problem it is necessary to endow the 
training algorithm with enough agility to avoid being 
trapped in a local minimum during several frames by 
reinitiating the training every frame. After this change,. 
the performance of the CENP has been evaluated using 
different number of epochs in the second stage of the 
training. Results evidentiate that only two epochs are 
enough to reach a level of performance very close to 
that obtained using ten or even more epochs. There¬ 
fore, the initial solution from which the gradient de¬ 
scent algorithm starts is very close to an acceptable 
solution; and even though it performs worse than the 
first procedure with respect to tracking slow variations, 
the second stage (gradient descent algorithm) might be 
completely eliminated, resulting in important compu¬ 
tational savings. 

The network size has been designed to maximize 
performance, resulting in an optimum size of 4 cen¬ 
ters of dimension 4. Thus, the network only uses the 4 
previous samples to predict the present one. This fact 
benefits our approach, since the analysis frame length 
may be shorter than that used for a 10th order pre¬ 
diction. It is important to remark that this length is 
only shortened with the purpose of training the RBF, 
while it is maintained to adapt the linear predictor in 
cascade, which has 10 taps. 

We explore the possibility of reducing the nonlinear 
analysis frame length (it should be noted that this re¬ 
duction only affects the network training). To that end. 


we code a moderate data base of speech signals (4 sen¬ 
tences of about 2 s., pronounced by four speakers; then, 
a total of 16 sentences) using different frame lengths. 
In order to avoid tying down the results to either a 
particular quantization scheme or a suboptimum pro¬ 
cedure to dynamically decide if the network is on or off, 
the simulations are carried out without quantizing and 
the decision about enabling or not the network is made 
from the whole synthesis with both options. In these 
conditions, we have found that performance is main¬ 
tained for frame lengths from 180 to 100 samples (the 
sampling rate being 8 KHz); under 100 samples, the 
performance begins to decrease. However, the degra¬ 
dation is acceptable for frame lengths of 80 and even 
60 samples; specifically, for 60 samples the segmental 
SNR diminishes around 0.05 dB, while the computa¬ 
tional effort is 3 times smaller than the corresponding 
to the preliminary length of 180 samples. 

With such a reduction of the computational cost, 
we estimate that the network training takes 4 times as 
much as the computation of the coefficients of a linear 
predictor, providing that an appropriate hardware im¬ 
plementation exploits the parallelism of the network. 
We believe that this increase in the computational cost 
is acceptable, since the LPC analysis consumes only 
about 2% of the typical DSP computation for a CELP 
coding-decoding system. 

Figure 3 shows the results corresponding to frame 
lengths of 100, 80, 60, 40, and 20 samples, for sev¬ 
eral values of the regularization parameter A around 
10. The performance reached by a CELP appears in 
solid line as a reference. The only difference between 
this CELP and our CENP is the predictor: the first 
one employs a linear predictor of .10 taps, while our 
approach uses the hybrid predictor described before. 
As we previously noted, the performances achieved for 
frame lengths larger than or equal to 60 are very simi¬ 
lar; as a result, a 60 sample frame-length seems a good 
choice. Regarding the influence of A, Figure 3 eviden- 
tiates that the best selection is A = 10, since the per¬ 
formance decreases as this parameter is increased. 

The performances achieved by the CENP coder work¬ 
ing at 10800 bps. for frame lengths of 180 and 60 sam¬ 
ples are shown in Table 1, in terms of segmental SNR. 
The performance obtained by a CELP working at the 
same bit rate is also shown for comparison. As it can 
be seen, any of the CENP coder versions shows an in¬ 
crease of around 0.4 dB in terms of segmental SNR over 
the results obtained using a CELP coder. 
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Figure 3: Coding performance vs. regularization para¬ 
meter for different nonlinear analysis frame lengths (in 
samples): 100 (—), 80 (—)i 60 (• • •)> 40 (• • -h • •)> 

20 (-o -). 



CENP 

CELP 

Frame length 

180 

60 

SNRSEG (dB) 

12.60 

12.56 

12.17 


Table 1: Performances of the CENP coder at 10800 
bps. for frame lengths of 180 and 60 samples. The 
performance reached by a CELP at the same bit rate 
is also shown for comparison. 


5. CONCLUSIONS AND FURTHER WORK 

Starting from a preliminary CENP coder using a non¬ 
linear frame length of 180 samplesj we have studied the 
pcv/ormcifxCG — complexity balance stated by varying the 
frame length. The experiments carried out show how 
the frame length can be reduced by a factor of three 
(down to 60 samples) without compromising perfor¬ 
mance. This reduction also implies lowering down the 
initial computational cost accordingly. 

We have also modified the initial training algorithm 
(which started every frame from the parameters cor¬ 
responding to the previous frame) by reinitiating it 
every frame. Surprisingly, we have found that only two 
epochs of the gradient descent algorithm are enough to 
reach an acceptable solution. This result suggests the 
possibility of completely eliminating the second stage 
of the network training. In such a case, the reduction 
of the frame length will still be very valuable for opti¬ 
mizing the first stage. 


Although the computational cost is still about 4 
times above that of the linear procedure, these results 
are encouraging as we consider that the proposed coder 
can be successfully implemented in real-time with a mi¬ 
nor improvement: as mentioned above, refining the first 
stage of the network training and skipping the gradient 
stage may results in significant savings. 

The inclusion of the network in the analysis by syn¬ 
thesis procedure is necessary to properly incorporate a 
perceptual weighting filtering. With this extension, a 
meaningful subjective quality test can be carried out. 
A preliminary way could consist on obtaining the gain 
associated to each vector by the suboptimum proce¬ 
dure currently used, i.e., without taking into account 
the nonlinear part of the synthesis system. Then, using 
these gains, the excitation vector that leads to a coded 
speech closer to the actual one is selected. 

• i 
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ABSTRACT 

%% 

In this paper we introduce the use of Boundary Meth¬ 
ods (BMs) for distribution analysis. We view these 
methods as tools which can be used to extract useful 
information from sample distributions. We believe that 
the information thus extracted has utility for a num¬ 
ber of applications, but jn particular we discuss the use 
of boundary methods foi* determining the suitability of 
a particular feature set for pattern classification. We 
present results which establish the correspondence of 
BMs and the probability of error (Pe) for normal dis¬ 
tributions. i 

1. INTRODUCTION 

For many investigations of physical processes, scien¬ 
tists and engineers must use samples drawn from the 
process in order to construct algorithms which model 
or monitor the underlying process. For example, in 
the telecommunications industry applications such as 
equalization (e.g. echo-cancellation), source-coding (e.g., 
video-coding using vector quantization), and detection 
(e.g., CDMA decorrelators) require that samples of trans¬ 
mitted signals be analyzed to formulate appropriate 
signal processing algorithms. For problems such as 
these we believe that the distribution analysis methods 
we describe will offer significant advantages in design¬ 
ing and fielding robust and efficient algorithms, par¬ 
ticularly those in which classification plays a dominant 
role in the processing. 

(*) Support for this work was provided by the AMOS Re¬ 
search consortium. Graduate student support was provided to 
Bill Pierson through the DoD Palace Knight Program. Stan 
Ahalt is currently on sabbatical leave at Universidad Politecnica 
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reccion General de Investigacion Cientifica y Tecnica, Ministerio 
de Educacion y Ciencia of Espaha. 
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If an investigator has a reasonably complete under¬ 
standing of the physical process, a mathematical model 
can be constructed and the samples can be used to es¬ 
timate the parameters of the model. If the number of 
samples are sufficient in number with respect to the di¬ 
mensionality and the statistics of the problem, then 
the needed pdf’s can be estimated, and an optimal 
Bayesian classifier can be constructed [1, 2]. 

However, in many practical situations, there are 
problems with this approach. First, constructing a 
model can be time consuming, and verification of the 
model can be problematic. Second, as the dimension¬ 
ality of the data increases, exponentially larger num¬ 
bers of samples are required to accurately estimate 
the class conditional probabilities. It is often either 
physically impossible or financially prohibitive to ob¬ 
tain the needed data. Thus, accurate estimates of the 
probability density functions can not be obtained - 
which implies that the Bayes error cannot be accu¬ 
rately estimated. Third, determining the estimates for 
the prior probabilities becomes especially difficult when 
the number of classes is large - which is common for 
many practical applications. One approximation is to 
assume a uniform distribution for the prior class prob¬ 
abilities [1] which simplifies the analysis. However, for 
optimal performance, the class probabilities need to be 
estimated accurately in order to apply Bayesian analy¬ 
sis. Consequently, for these reasons, investigators must 
turn to other alternatives for many practical problems. 

For those cases in which the process cannot be read¬ 
ily modeled, either supervised or unsupervised learning 
can be employed. In either case, the learning process 
can be viewed as distribution analysis. If only unla¬ 
beled data is available, e.g., when the number of classes 
is unknown, unsupervised learning techniques, usually 
based on clustering, are used to discover a model which 
captures the structure of the data in the data-space. 
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Clusters thus formed can be evaluated, e.g., using In¬ 
dices of Partitional Validity (IPVs) [3, 4], in an attempt 
to measure how well the clusters capture the structure 
of the data. Usually these indices use some combina¬ 
tion of measures which quantify the compactness and 
isolation of each of the discovered clusters. While these 
methods have proven to be useful [5, 4], they require 
the use of an explicit distance metric. The choice of 
this distance metric can have a significant impact on 
the reliability or utility of the analysis. 

When labeled data is available, as we assume here, 
it is standard practice to use mixture decomposition 
techniques to allocate each pattern to a particular clus¬ 
ter, and then estimate the cluster parameters. These 
techniques generally require that the number of clusters 
be known and adopt a density model which assumes 
that the clusters are multivariate normal. Mixture de¬ 
composition techniques then focus on 1) assigning each 
data sample to the correct cluster, and 2) estimating 
the mean and covariance matrices of each cluster. A 
particularly complete discussion of these techniques can 
be found in [4]. 

2. MOTIVATION 

Our research is focused on determining measures we 
believe are of significant importance to designers of pat¬ 
tern classifiers. In particular we are interested in: 

• Classifier Independent Discriminant Measures 
(CIDM)s which yield a numeric result which in¬ 
dicates how separable two classes, or the features 
derived from two classes are. In particular, we 
are searching for CIDMs which can be directly 
related to probability of error (Pe) or other per¬ 
tinent measures. 

• Feature-Set Evaluation (FSE) techniques which, 
given alternative ways of deriving feature sets 
from observations, order those sets by classifica¬ 
tion - fitness. Of course, this measure of fitness 
should also be related to Pe or other pertinent 
measures. 

• Sample-Pruning (SP) techniques which support 
the development of classifiers which are: 

— quickly constructed, 

— execution-efficient, 

— generalized, and 

— robust. 

We observe that a CIDM would ideally analyze all 
data contained in the sample population and yield one 


value to denote how separable the two classes are. On 
the other hand, FSEs analyze at least two sample pop¬ 
ulations and yield two values which can be compared 
to determine which of the two populations consist of 
better features. Finally, SP techniques operate on one 
population and yield another population which is al¬ 
ways a subset of the original population. 

Further, to clarify the third point above, we ob¬ 
serve that a) in order to quickly construct a classifier, 
we need to minimize the use of samples that provide lit¬ 
tle useful classification information, b) classifiers that 
are execution-efficient are those constructed such that 
a small number of parameters need to be evaluated in 
order to reach a classification decision, c) generalized 
classifiers reliably estimate the classification mapping 
using noisy training samples, and d) robust classifiers 
reliably estimate the classification mapping when the 
noise process which affects the training samples differs 
from the noise process which affects the testing sam¬ 
ples. 

While we do not claim to have solved any of the 
above problems, we believe the distribution analysis 
technique we discuss here, BM, does have a significant 
benefit in meeting the objectives of FSE and SP. In 
this paper, however, we restrict our discussion to the 
use of BM for FSE. We also note that the statistics 
community has an extensive literature on the use of lin¬ 
ear methods, particularly principal component analy¬ 
sis. However, we are not aware of more directly related 
FSE work in the statistics community. Similar work in 
the Neural Networks and pattern recognition commu¬ 
nities also rely on identification of individual features. 
An excellent review of some of these techniques can be 
found in [6]. 

3. GENERAL DESCRIPTION OF THE 
BOUNDARY METHOD 

This method exploits the distributions in a controlled 
way in order to extract useful information about how 
the distributions are composed relative to each other. 
Suppose we have a hypothetical population consisting 
of two distributions drawn from two classes, as shown 
in Fig.(l)(a). 

We enclose the classes with a boundary, according 
to some criteria. In the case shown in Fig.(l)(b) we 
have drawn the enclosing boundaries as closed inter¬ 
polated splines with approximately 15 control knots. 
Generally, some criteria is used in forming the bound¬ 
aries so that obvious outliers are excluded and rela¬ 
tively compact boundaries are formed. 

While the boundaries shown are quite complicated, 
it is reasonably easy to specify and manipulate compli- 
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Figure 1: a) A population of two distributions drawn 
from two classes; b) Distributions enclosed by arbitrary 
boundaries; c) TVajectory of samples enclosed by increas- 
ing boundaries. 


cated boundaries using methods such as interpolated 
splines. Indeed, the boundaries can be determined in 
any way that is computationally feasible, and can be 
of arbitrary shape - with suitable modifications to the 
basic algorithm. In the examples presented later we 
use elliptical boundaries since they are very simple to 
specify and manipulate. 

We then collapse the boundaries until the bound¬ 
aries are just touching. For complicated boundaries, 
the amount of shrinking necessary to effect tangential, 
or ”just-touching” boundaries can be relatively difficult 
calculate and there can be multiple possible solutions. 
However, for many boundaries - such as the one shown 
- a straight-forward search process over the scale of the 
boundary, e.g., about the center point, can be used to 
establish the tangential boundaries. In any case we al¬ 
ways establish a canonical starting point at which the 
boundaries enclose the most samples from each class, 
but without overlap of the volumes enclosed within the 
boundaries. 

We choose to use tangential boundaries for the fol¬ 


lowing reasons. First, shrinking the boundaries until 
they are tangential establishes a specific point to begin 
our calculations and allows us to normalize our results, 
as explained later. Second, the tangential boundaries 
enclose subsets of the class samples that, we believe, 
most reliably represent the class distributions. Finally, 
as briefly discussed later, tangential elliptical bound¬ 
aries have a direct relationship to Fisher’s LDA projec¬ 
tion axis. 

At this point we have established the two values 
to be used as the end-points in our final calculations: 
the original enclosure size, labeled Rt.X(n); and a min¬ 
imum, non-overlapping size, labeled Rt.X(O). 

We now begin to grow the boundaries. We grow 
the boundaries gradually for a number of steps, say 
n, that is sufficient to obtain the desired trajectory, 
as described below. We refer to the area under the 
trajectory we form as the Trajectory Area (TA). As 
the boundaries grow, the number of samples that are 
enclosed from each class increases, and the number of 
samples that are enclosed in the region common to both 
boundaries increases. We keep track, via a count, of 
how the number of samples in either the overlap re¬ 
gion, or within both boundaries, increases, regardless 
of class. We observe, however, that we can weight the 
samples such that the closer the samples are to the 
boundary the more they contribute to the count. 

Once we have expanded the boundaries out to their 
original position, we have captured a measure of how 
the samples are distributed in the space as recorded 
in the count. We now plot the trajectory, as shown 
in Fig.(l)(c). In Fig.(l)(c) we have plotted two hy¬ 
pothetical trajectories for two diflferent populations of 
distributions of two classes. 

We claim that areas under these trajectories is a 
measure of how the samples are distributed in the fol¬ 
lowing sense. The areas for two different sample popu¬ 
lations can be quantitatively compared to one another, 
and the relative ordering of the areas is invariant with 
respect to linear transformations applied to the data 
distributions. 

For population A the samples that are enclosed in 
the minimal-boundary, Rt.X(O) are relatively compact 
(76% are enclosed when the ellipses are tangential) and 
the number of enclosed samples quickly asymptotes to 
the number of samples enclosed by the final (i.e., the 
original) boundary, thus the classes are, to some degree, 
isolated. 

In contrast, population B has a trajectory that indi¬ 
cates that the distributions are not very compact (only 
22% are enclosed by the tangential boundaries) and the 
trajectory indicates that the two classes are fairly in¬ 
terspersed, as the trajectory climbs slowly to the final 
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value - hence the classes are not well isolated. 

By comparing the areas (TA) under the two trajec¬ 
tories we have a qualitative measure of how good the 
two distributions are for classification. Population A, 
which has the bigger area, will be more easily classified 
than Population B. We have, effectively, a method for 
FSE. 

4. RESULTS 

In this section we show two simple experimental results 
using Boundary Methods. The first is shown in Fig.(2), 
where we have selected two test populations in which 
one population is more separable than the other. 



Figure 2: Figures a) and b) show two test populations (200 
samples of each class) with similar separability; Figure c) 
shows the trajectory areas of populations a) and b). We 
can see they are very similar. 

For this population we use elliptical boundaries, and 
calculate the Trajectory area (TA) using the counts of 
samples within the overlapping boundaries. The initial 
boundary for each class is formed by an ellipse, where 
the size of the ellipse is determined using a Chi-square 
test and fixing the percentages of enclosed samples. 
Since we know that the data consists of two classes, 
we have simply estimated the means and covariances 
directly. If the data was not unimodal we could either 
use a single boundary (which would result in a differ¬ 
ent trajectory) or optionally employ unsupervised tech¬ 


niques to determine the number of clusters-per-class to 
use when estimating these parameters. 

The use of ellipses is attractive for many reasons. 
First, the assumption is reasonably satisfied for many 
realistic cases [6]. Second, the assumption that the data 
is distributed normally simplifies analysis, giving rise to 
ellipsoidal boundaries because Gaussian distributions 
have elliptical constant-density contours. Quadratic 
forms such as ellipses lend themselves to formal analy¬ 
sis and are closely tied to Bayesian formalisms. Third, 
ellipses are computationally attractive because they are 
easily manipulated as they can be specified with a mean, 
a covariance (matrix), and a volume. Fourth, for el¬ 
lipses, the amount of shrinking necessary to effect tan¬ 
gential, or ”just-touching” boundaries is relatively easy 
to calculate (although there are multiple possible solu¬ 
tions). 

There are a number of ways the ellipses can be col¬ 
lapsed. We typically collapse each boundary so that 
the constant-density contours of each class, which fixes 
the volume of each ellipse, are kept equal. A simple 
search procedure will yield this solution quickly, since 
only one parameter needs to be varied. However, an 
alternative is to vary the density contour of each of 
the class distributions separately in order to determine 
tangential ellipses with other properties. For exam¬ 
ple, it is possible to find tangential ellipses which have 
equal magnitude gradients at the tangent point, and 
that gradient is equivalent to Fisher’s linear discrimi¬ 
nant projection vector. 

Our first example, shown in Fig.(2), demonstrates 
how the areas under the trajectories (TA) correctly in¬ 
dicate that the first test population is more readily sep¬ 
arated that the second. 

Another example of the technique is shown in Fig. (3). 
For this case the means of the distributions are more 
widely separated in the data space. Note that in this 
case the trajectories, shown in Fig.(3)(c) and their asso¬ 
ciated trajectory areas correctly indicate that the first 
test population is more readily separated that the sec¬ 
ond - and that the differences in the separability be¬ 
tween the two test populations are more pronounced 
than in our first example, as is visually apparent in 
looking at the data. 

As can be seen, our preliminary tests indicate that 
the Boundary Method works well for these simple dis¬ 
tributions. We are now working on more extensive 
tests, as well as more formal analyses. 

The second experiment demonstrates the correla¬ 
tion between the trajectory (TA) area and the proba¬ 
bility of error (Pe). Fig.(4) shows the correlation be¬ 
tween TA and the Pe for a number of distributions with 
different means and covariance matrices. We show two 
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Figure 3: The Figures a) and b) show two test popula¬ 
tions (200 samples of each class) with more widely separated 
means than the Figure 2; the Figure c) show the trajectory 
aj:eas of populations a) 2unid b). Note that the difference in 
the areas is greater in this second example, because of the 
more widely sepsuated means and differing covariances. 




equivalent simulations differing only in that Fig. (4) (a) 
uses the actual mean and covariance matrices, while 
Fig.(4){b) uses estimated means and covariance matri¬ 
ces. 

5. CONCLUSIONS 

We have presented a new method for Feature Set Eval¬ 
uation (FSE) which provides information useful in de¬ 
termining how separable one population is with respect 
to others drawn from the same source. This collection 
of methods is called Boundary Methods (BM) because 
arbitrary boundaries can be employed to investigate 
various separating surfaces. The relationship among 
the classes is captured in a number called the Trajec¬ 
tory Area (TA). We have given results of simulations 
using Gaussian distributions and elliptical boundaries 
which show that the BM-TA has a correlation factor of 
near one with the Pe. 
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ABSTRACT 

In this work we have made use of automata theory 
in training Artificial Neural Networks (ANNs), with the 
practical application of phoneme recognition in voice 
signals. Our aim is to obtain Recurrent ANN struc¬ 
tures (RANNs) that make use of the inherent sequen¬ 
tiality of the voice signal and at the same time are 
easy to train by simultaneously viewing two spaces, the 
output space and the state space. These spaces corre¬ 
spond to those of a finite state automaton, generalized 
by the network, that detects a: given feature pattern. 
This methodology has been and is being applied to the 
construction of a modular speech recognizer for Span¬ 
ish phonemes. The modularization of the recognition 
presents the clear advantage of facilitating training in 
each submodule of the network, and at the same time 
we are injecting knowledge through this modularization 
in the global structure of the recognizer. 

1. INITIAL WORK 

Our starting point has been the use and generation of 
recurrent ANNs for the processing of binary signals in 
a purely sequential mode. We have employed recur¬ 
rent topologies in order to avoid static ontologies for 
solving problems in signal processing that are intrin¬ 
sically dynamic. The main reason for processing the 
signal sequentially is to avoid or minimize the effect of 
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the windowing process, by means of which a set of sig¬ 
nal parameters are presented to a network as parallel 
inputs. The drawback of this solution is the correct 
selection both, of the representative parameters and of 
the appropriate window size. 

In order to avoid these problems, instead of adopt¬ 
ing a preset architecture and topology [1] [2], we have 
chosen an evolutionary strategy that determines the 
most adequate network structure for each particular 
problem. Thus, we have initially used our Genetic 
Algorithm based application development environment 
GENIAL [3] for the generation of RANNs that detect 
bit patterns in binary signals [4]. 

However, the generalization of the methodology to 
non binary signals presents the drawbacks of requiring 
a training set in which the number of analog input- 
output pairs may be infinite over a continuum of values, 
and that we must establish in this set a gradation of the 
similarity between each one of the inputs and the pat¬ 
tern to be recognized in order to establish an equivalent 
gradation over the response of the network. These two 
facts enormously complicate the search space, generat¬ 
ing multiple local minima and deceptive points so that 
our genetic algorithm takes too long to find a structure 
for solving the problem. 
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2. GENERALIZATION OF FINITE STATE 
AUTOMATA FOR THE DETECTION OF 
ANALOG SIGNAL SEQUENCES 

Our solution has been to constrain the search space by 
limiting the possible network topologies. Thus, we have 
concentrated on structures that simulate finite state au¬ 
tomata, automata which detect an input pattern with 
strict well defined rules (transition and emission ma¬ 
trices) [5]. In other words, we have introduced knowl¬ 
edge by establishing topologies that are appropriate for 
the increased complexity, but without losing the ba¬ 
sic characteristics of recurrent topology and sequential 
processing of the inputs. 

The starting idea is to obtain a non recurrent net¬ 
work for training that is equivalent to the finite state 
automaton that determines the presence or absence of a 
given pattern in the sequentially processed inputs. Our 
equivalent network must generate not only the outputs 
of the automaton for each input it receives, that is, 
each emission from the automaton, but also the inter¬ 
nal state of the automaton for each transition. In other 
words, the equivalent network generates values in two 
different spaces, the output space and the state space, 
both encoded in one or several nodes of the network. 
The training set must contain the cases that determine 
the transition rules of the finite number of states and 
the emissions produced in each transition. In its use, 
however, the network is going to be recurrent when it 
acts over the inputs, as these inputs are made up of 
the signal values and the coding of the previous state, 
producing the current outputs and state. 

For the training process, we must construct a train¬ 
ing surface in a representation in which one of the 
axis is the state space and the other the output space. 
Now, depending on how these training surfaces are con¬ 
structed, we will be able to force different types of gen¬ 
eralizations in our networks as a function of what we 
wish to reflect in our outputs, allowing for a sharp de¬ 
tection or a similarity classification of a pattern through 
the construction of sharp or smooth areas in the train¬ 
ing surface around the pattern to be recognized. 

This solution has generalized the concept of finite 
state automaton to the consideration of a continuum 
of states and emissions, going from emission and tran¬ 
sition matrices to the generation, by means of connec- 
tionist systems, of non linear emission and transition 
functions. 

The methodology correctly solves the detection of 
analog patterns of small lengths. We must take into ac¬ 
count that having a non recurrent network in training, 
this training procedure may be solved with the same 
genetic methodology as in the previous cases or using 


well established training algorithms for these topolo¬ 
gies such as gradient descent. However, eis the length 
of the patterns to be recognized increase, the two er¬ 
ror surfaces become very complicated, as in the case of 
phoneme recognition, whose solution we present below. 

3. USE IN PHONEME RECOGNITION 

We have started to use the strategy presented above in 
the field of automatic speech recognition. In particular, 
our aim is the recognition at the phonological level, be¬ 
cause in voice signals, each phoneme presents a charac¬ 
teristic pattern of parameters that evolve in time. The 
methodology of training overcomes the problems aso- 
ciated with clasical training procedures for recurrent 
neural networks such as the Boltzmann machine [6] 
and backpropagation for recurrent networks [7]. 

As a first step, we have applied our methodology to 
the problem of recognizing the five vocalic phonemes 
in the Spanish language. We have applied the FFT 
over fixed size (8 msec) Hamming windows of signals 
sampled at 8 KHz. By doing this we have not lost 
the inherent sequentiality in the voice signal, as it is 
the evolution of the different windows in time what is 
going to determine the presence or absence of a given 
phoneme. This way, we may use networks simulat¬ 
ing finite state automata, with parallel inputs whose 
temporal evolution represent the evolution of a given 
phoneme. 

We have carried out the following tests: 

1. Use of a RANK that processes current inputs and 
previous state for the recognition of the five Spanish vo¬ 
calic phonemes. The network is a perceptron with two 
hidden layers, 16 input nodes corresponding to the fre¬ 
quency spectrum (between 0 and 4kHz) of each section 
and 10 output nodes, 5 of them represent the recogni¬ 
tion level for each phoneme and the remaining 5 specify 
the state of each one of the phonemes throughout the 
evolution (Figure 1). 

The training surfaces have been generated using 
”ramp” functions, both in the input and the output 
spaces. As pointed out in [8] ramp functions as tar¬ 
get functions on the duration of a phoneme determine 
a linear increase in the estimation of the detection of 
a phoneme, as the pieces of the signal are sequentially 
input. In other words, the network is going to be gener¬ 
ating a larger level of detection and of the internal state 
corresponding to a phoneme as it receives correct sam¬ 
ples of it. This allows the network to prevent making 
mistakes when in a given instant it receives distorted 
input signals, which it will ignore if its internal state 
level is sufficiently high and compensates for the wrong 
input. With only 12 samples, belonging to 9 speakers, 
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Figure 1: Equivalent neural network of the automaton 
for the detection of the vocalic phonemes. 

6 male and 3 female, in the training set, segmenting 
by hand the pertinent areas of each phonemes in their 
intervals of largest energy, we have obtained a recog¬ 
nition level of 100 % in the training set and 97.4 % in 
the test set. 

2. In this solution we have a training procedure 
that is to rigid in the sense that we are imposing a 
supervised training.scheme on the evolution of both, 
the outputs and the state. We are also imposing a 
linear ramp function in this evolution. This selection 
is not critical in the case of the output space, as we 
could have chosen other evolution functions, which in 
the end would only specify the recognition levels during 
the evolution and would not actually affect it. But it 
does seem a very restrictive criteria on the evolution of 
the state space, which is fedback as input in the next 
step and consequently influences subsequent outputs 
and states. It would probably be more convenient to 
try to obtain some type of less supervised training for 
the state variables, allowing the networks to determine 
their correct value during evolution. 

We have employed a strategy that, even though it is 
supervised, it is less restrictive regarding the state val¬ 
ues as well as the recognition levels for each phoneme: 

i) if the largest of the outputs that encodes the 
state or the recognition levels corresponds to the cor¬ 
rect phoneme, then we will not train or we will train 
less over these outputs. The same for the recognition 
level nodes. 

ii) Otherwise, we will train all of the outputs nor¬ 
mally. 

Over the same samples as in the previous case, the 
recognition levels are very similar, 100 % for the train¬ 
ing set and 94.1 % for the test set, but this less restric¬ 
tive supervision in all the output nodes of the network 
produces a very important consequence. We are now 
generating output maps that are close to those pro¬ 
duced in non supervised training in the sen.se that we 



Figure 2: Level recognition in the transition between 
the phonemes /a/ to /e/ in continuos speech with the 
second strategy 

obtain a clustering of the inputs, generating values for 
outputs that are similar in those nodes that represent 
phonemes whose features are similar, in this case whose 
frequency spectrum is similar. Figure 2 represents the 
gradual transition from the phoneme /a/ to /e/ with 
this strategy. 

We must now extend this network model to the rest 
of the phonemes in the Spanish language. In order to 
do this we have chosen a structure whereby we create 
specialized recognition submodules for the recognition 
of different groups of phonemes. Thus we propose the 
following modules: vocalic, voiced and unvoiced occlu¬ 
sive, fricative, lateral and nasal. The modularization 
of the recognition presents the clear advantage of facil¬ 
itating training in each submodule of the network, and 
at the same time we are injecting knowledge through 
this modularization in the global structure of the rec¬ 
ognizer. 

In the case of consonantic phonemes, we find new 
problems. For instance in unvoiced phonemes such 
as /f/, /s/ or /z/, the non periodicity of their wave¬ 
form implies that it is very difficult to obtain a win¬ 
dow width that is appropriate for the computation of 
the FFT, which is not going to be as constant as in 
other phonemes, specially due to the presence of noise 
in these signals that have a much lower energy level 
than vowels. For this reason we have applied a low 
pass filter over the temporal evolution of the spectra, 
obtaining results, using the second strategy, of 98.1 % 
recognition over the training samples and 70 % over 
the test samples with the same group of speakers. 
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4. FUTURE WORK 


6. REFERENCES 


We have applied this methodology to phoneme recog¬ 
nition training the RANNs with samples that were seg¬ 
mented by hand. This permits a sufficiently clear de¬ 
tection of the phonemes when testing using continuous 
speech samples, as we do not detect the punctual pres¬ 
ence of a phoneme through the analysis of a frame, but 
its evolution throughout the duration of the phoneme. 
From this point on we must try to include in the train¬ 
ing samples the coarticulation effects of each phoneme 
with its possible neighbors in order to improve the qual¬ 
ity of the recognition. On the other hand, the net¬ 
work must detect phonemes independently not only of 
their temporal lengths, but also of the length of the 
features whose evolution determines the presence of a 
plioneme. This is to say, that we must make the au¬ 
tomaton more flexible with respect to temporal expan¬ 
sions or contractions in the signal features. That is the 
case of the voiced stop consonants (/b/, /d/ and /g/), 
whose characteristics of constriction in the vocal tract, 
low frequency energy and final explosion, with lack of 
high frequency energy, or the unvoiced stop phonemes 
{/p/, /t/ and /k/) characteristics of constriction, frica- 
tion, aspiration and voiced exi)losion vary in temporal 
length, but this variation in an explosive interval that 
is extremely short with respect to the duration of other 
phonemes is of the utmost importance for its recogni¬ 
tion. 

5. CONCLUSIONS 

These preliminary results are very promising as they 
show a path for a simple imiilementation of speech 
recognition systems using the inherent sequentiality of 
speech signals as a resource in order to improve their 
recognition. In addition, the modularization of the sys¬ 
tems permit reducing the ever present complexity of 
training in these systems, allowing for a better choice 
of training sets, some control over the sharpness of the 
detection we desire and a reduction of the problems 
introduced by windowing. Due to the usual lack of 
quality of speech signals and the noise levels in nor¬ 
mal speaking environments, recognition of individual 
phonemes in real speech will never be perfect. This 
implies that it is necessary to generate mechanisms for 
the interpolation of mistakenly recognized phonemes, 
which obviously can only be carried out through the 
introduction of context knowledge in the recognition 
systems. Their modularization will also help in this 
process. 
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ABSTRACT 

The undesirable variability caused by foreign accent 
in speaker independent speech recognition systeTns[6] 
is claimed to be due to interference of the speaker’s 
native language[3, 9]. The pronunciation of particular 
vowels are the most common and simple type of vari¬ 
ation between accents. The Formant structure is the 
basis for the recognition of most vowel difFerences[5], 
and vowel color is mainly determined by the frequen¬ 
cies of the first two formant frequencies. This work 
explores the nature of accent in general, highlights the 
differences between Spanish and English, and analyzes 
the first two formarits of the vowels of Spanish-accented 
English. These are used as features for detection of 
Spanish accent. The conclusion is that Spanish accent 
is detectable using formants of vowels as features. 

Introduction 

Accent, if defined as a manner of pronunciation, is 
ubiquitous since variability is inherent to pronunciation 
within and between speakers. In spite of this variability 
the perceiver discriminates the patterns of the speech 
sounds, partly by the distinctiveness of the patterns 
in the artifacts and partly by prediction. Therefore, 
for communication to occur, the contrasts on which 
the patterns are based should not be obscured. Some 
of the linguistic regularities of the patterns could be 
exploited by a Speech Recognition System (ASR)[2]. 
Identification of an accent will allow an ASR to im¬ 
prove its performance, by making use of the explicit 
knowledge about such accent. 

Besides carrying the patterns which convey language, 
the speech sounds accommodate nonlinguistic signs. 
These signs may be called Indexical Features, some of 
which are distinctive ways of pronouncing certain vow¬ 
els or consonants, or of word and sentence stress and 


intonation patterns. 

Almost all speakers of all languages have Regional 
Indices in their pronunciation. The word accent in its 
popular sense,is usually used to refer to these regional 
indices. These indices can alternatively be used to as¬ 
sign a different meaning to the words with which they 
are used [9]. 

Foreign accent, according to Chreist[3], is the in¬ 
ability of the individual habituated in patterns of his 
native language to hear the fine differences present in 
the sound pattern of a second language. In agreement 
with him, Lado[8] says that the speaker of one lan¬ 
guage listening to another does not actually hear the 
foreign language sound units-phonemes. He hears his 
own. Phonemic differences in the foreign language will 
be consistently missed by him if there is no similar 
phonemic differences in his native language. 

Hispanics whose native language is Spanish may 
have problems with some English vowels, consonants, 
consonant clusters, word and sentence stress and in¬ 
tonation. According to the interference theory it can 
be predicted that accent differences will be either sys¬ 
temic, i.e., caused by a different number of phonemes 
between the two languages, or realizational, which are 
related to how the speakers actually pronounce the 
phonemes. Realizational differences account for most 
of the distinction between one accent and another, and 
hence provide most of the identifying characteristics 
of each accent. Usually, these differences pervade the 
whole vowel system, and make up the distinguishing 
characteristics of a particular accent[9]. 

Duration, another important feature for distinguish¬ 
ing vowels like [UW](two) and [OW](four), has to wait 
for a more representative sample. The trend so far, 
as predicted, is that those vowels are longer for native 
speakers, most probably due to the diphthongization 
of single vowels, not present in Spanish. 



Some American authors claim that there is no vari¬ 
ation in length in Spanish vowels; and we could say 
that tliis is so compared with the range of variation in 
length of English vowels, where, for stressed positions, 
the shorter vowels (IH,UH,EY,and AH) are more cen¬ 
tral. It would be closer to the truth to say that in Span¬ 
ish, distinctive differences of duration are not stablc[4]. 
Therefore we can say that in Spanish, vowel duration 
does not have a contrastive function. 

Comparison between Spanish and 
English 

Spanish is one of the languages in which the spelling in¬ 
dicates the sounds of the letters. Consonants and vow¬ 
els are distributed about equally and are pronounced 
with relatively the same duration. The five vowels are 
pure in sound quality, and two or more frequently occur 
in consecutive arrangement... There are fourteen diph¬ 
thongs. A few consonant clusters are in the system. 
Aspiration is not strongly applied to either consonants 
or vowels. There are nineteen consonant phonemes, 
[ch] and [f] are lowest in frequency count. Of the vow¬ 
els, [i] and [u] have the lowest frequency count[10]. 

English, compared with Spanish, is a relatively un- 
phonetic language. It is not always possible to deter¬ 
mine the sound from the spelling. The consonant dis¬ 
tribution exceeds that of the vowels and there is a vari¬ 
ation in tlie duration api)lied as each is pronounced. 
There are eleven vowels (vowel sounds). Authorities 
differ in the number of diphthongs. The vowels are 
not pure in sound quality and there is a tendency to 
lengthen them into diphthongs, particularly in some 
regions. Two vowels may occur in consecutive arrange¬ 
ment. The frequency of weak, or unstressed, vowels is 
typical. There are 25 consonant phonemes, and conso¬ 
nant clusters are typical with many occurring in final 
position[ll]. 

Spanish does not have the following vowel sounds. 
They are followed, when predictable, with the vowels 
Hispanics might substitute them with[7]. 

IH (bit, hid,..) with [IY](beat, heed). 

EY (bait,hayed) ? 

AE (bat,had) with [EH](bet, head) or [AA](father, 
hod). 

AX (but,ago) with vowel suggested by spelling. 

UH (book,hood) with [UW](boot). 

OW (boat) with [AO](bought). 

AH (rnud) ? 


Experiments 

The experiments focus on the Formant Structure of 
Spanish-accented vowels. Some of the vowels pronunci¬ 
ation problems for Hispanics, like UH](bulls), [AH](but), 
[ER](girls) have been overcome or blurred by the speak¬ 
ers analyzed so far and avoid easy discrimination. The 
closest vowel is [ER](girls) even though it doesn’t ex¬ 
ist in Spanish. The most separable vowel turned out 
to be [AE](ran) which Hispanics will substitute for a 
vowel close to the Spanish vowel [a] (padre, father in 
Spanish), which is the most frequent vowel in Spanish. 
The Spanish vowel [a](padre)sits between the English 
vowels [AA](father) and [AE](ran). 

All the vowels were analyzed. Segmentation was 
done by a combination of a dynamic programming pro¬ 
cedure that looks at spectral coefficients, listening to 
the sounds and looking at the evolution of formants. 
The data, originally at 16 KHz was bandpass filtered 
and resampled at 10 KHz, and together with sets of 
excitation markers, the formants(Fl to F4) were ex¬ 
tracted. For each speaker, the means of the first two 
formants were calculated. Then the overall mean (for 
all speakers) for each formant was calculated. These 
last values for the first two formants are the ones re¬ 
ported in tabular and graphical form together with the 
typical values for American English. 

Average duration for the words two [UW] and four 
[OW] are shown below. As predicted, Americans’ av¬ 
erage is longer. 

From these data, the easiest vowel to detect accent 
[AE] by simply using a Linear Discriminant fiuiction, 
or by measuring the Euclidean distance with two or 
three formants. 

A linear discriminant shows (Figure 2) that in the 
sample, with the vowel [AE](ran) there is only one 
Hispanic out of 11 Hispanics classified as American. 
With the vowel [IH], three are misclassified, followed 
by [AA](father), etc. The addition of the third formant 
didn’t modify the classifications. 

Database 

The experiments were done with eleven adult males 
(natives of Cuba and Peru) from a Database of Spanish- 
accented English speakers from the Miami area ^, and 
six adult males from a growing home-based Database*^ 
of native American speakers. The American English 
database h 2 LS been used for contrasting and classifica¬ 
tion purposes. 

^provided by Dr, Marc Zissman from MIT 

^partly with the help of Kevin Campbell, Music Dept. UNM 
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Figure 1: F2 vs FI for Hispariics and Typical American 
English 


Formant vowel frequencies in Hz. for 

Spanish-accented English and Std. English 

Vowel 

Fl(Hisp) 

Fl(Am) 

F2(Hisp) 

F2(Am) 


663 

730 

1584 

1090 

[AE] 

670 

660 

1422 

1720 

[AH] 

534 

520 

1282 

1190 

[AO] 

556 

570 

1121 

840 

[IH] 

369 

390 

1998 

1990 

[EH] 

515 

530 

1641 

1840 

[EY] 

392 

400 

2056 

2100 

[/r] 

328 

270 

2160 

2290 

[OW] 

471 

450 

799 

900 

[UH] 

439 

442 

827 

1020 

[UW] 

331 

300 

983 

870 

mL. 

491 

490 

1361 

1350 


Formant vowel frequencies in Hz. 
for Spanish-accented English and Spanish. 

Vowel 

Fl{Hisp.) 

F2(Hisp.) 

Fl(Sp.) 

F2(Sp.) 

[AA] 

687 

1580 

725 

1300 

[EY] 

392 

2056 

450 

1900 

[rv] 

328 

2160 

275 

2300 

[OW] 

471 

799 

450 

900 

[UW] 

330 

974 

275 

800 


Average Duration in seconds 

Word 

Hisp. 

Am. 

tiuo[UW] 

fcyuT[OW] 

0.26 

0.32 

0.43 

0.39 


Figure 2: Linear Discriminat for vowel /AE/(ran) 

Conclusions 

Spanish-accented English is detectable using formants 
of vowels as features. The vowel [AE](ran) is the most 
separable, then IH. 

Therefore, to detect Spanish-accented English from 
vowels with highest probability, this study shows that 
the vowel [AE](ran) is the best choice. 

Future Work 

From the current experience with the Miami database, 
the similarities far outweigh the differences. Once the 
detection is done, the recognition could be done in sev¬ 
eral ways: A spectral transformation that establishes a 
correspondence between pairs of ^typical* spectra from 
two talkers based on their occurrence in the same con¬ 
text speech (for example, vowels embedded in carrier 
words); or Hidden Markov Models could be trained sep¬ 
arately for Hispanics or Multiple Pronunciation Models 
could be used supported by rules based on the knowl¬ 
edge of the characteristics of the Hispanics. 
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ABSTRACT 

This tutorial reviews the state of the art in applications 
of chaos in broadband communications. 

1. INTRODUCTION 

The goal of a digital communication system is to con¬ 
vey information from a digital information source to 
a receiver through a channel as effectively as possible 
[1]. This is accomplished by mapping the digital infor¬ 
mation to a sequence of symbols which vary some pa¬ 
rameter of an analog electromagnetic wave called the 
carrier {modulation). At the receiver, the signal is de¬ 
modulated, interpreted, and the information recovered. 

The mapping from baseband digital information to 
a passband carrier signal may be accompanied by en¬ 
cryption and coding to add end-to-end ‘security^ data 
compression, and error-correction capability. 

Built-in error-correction capability is required be¬ 
cause real channels distort analog signals by a variety of 
linear and nonlinear mechanisms: attenuation, disper¬ 
sion, fading, noise, interference, multipath effects, etc.. 
A channel encoder introduces algorithmic redundancy 
into the transmitted symbol sequence that reduces the 
probability of incorrect decisions at the receiver. 

Modulation is the process by which a symbol is 
transformed into an analog waveform that is suitable 
for transmission. Common digital modulation schemes 
include Phase-Shift-Keying (PSK) and Frequency-Shift- 
Keying (FSK), where a one-to-one correspondence is 
established between phases and frequencies, respectively, 
of a sinusoidal carrier and the symbols. 

The channel is the physical medium that carries the 
signal from the transmitter to receiver. Inevitably, the 
signal becomes corrupted in the channel. Hence, the ’ 
receiver seldom receives exactly what was transmitted. 
The role of the demodulator is to produce from the cor¬ 
rupted received signal an estimate of the transmitted 

The author acknowledges invaluable discussions with 
G. Kolumban, TU Budapest. 


Information In 



Information out 


Figure 1: Digital communication system showing 
source and channel coding, modulation, and channel. 

symbol sequence. The channel decoder exploits redun¬ 
dancy in the transmitted sequence to reconstruct the 
original information. Because of disturbances in real 
channels, error-free transmission is not possible. 

The performance of the communication system is 
measured in terms of the bit error rate {BER) at the 
receiver. In general, this depends on the coding scheme, 
the type of waveform used, transmitter power, channel 
characteristics, and demodulation scheme. The con¬ 
ventional graphical representation of. performance in 
a linear channel with Additive White Gaussian Noise 
(AWGN) shows bit error rate versus Eb/No^ where Eb 
is the energy per bit and Nq is the power spectral den¬ 
sity of the noise introduced in the channel. 

For a given background noise level, the BER may 
be reduced by increasing the energy associated with 
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Figure 2: Comparison of the noise performances of two 
digital modulation schemes: differential PSK (solid) 
and noncoherent FSK (dashed). 

each bit, either by transmitting with higher power or 
for a longer period per bit. The challenge in digital 
communications is to achieve a specified BER with 
minimum energy per bit. A further consideration is 
bandwidth efficiency [1]. 

Nonlinear dynamics has potential applications in 
several of the building blocks of a digital communica¬ 
tion system: data compression, encryption, and modu¬ 
lation. In this paper, we focus primarily on the applica¬ 
tion of chaos as a spread spectrum modulation scheme. 

2. SPREAD SPECTRUM MODULATION 

In spread spectrum modulation, the transmitted signal 
is spread over a much larger bandwidth than is nec¬ 
essary to transmit the baseband information. Spread 
spectrum can be used for: 

• combatting the effects of interference due to jam¬ 
ming, other users, and multipath effects, 

• hiding a signal “in the noise” by transmitting it 
at low power, and 

• achieving message privacy in the presence of eaves¬ 
droppers. 

Conventional spread spectrum communication sys¬ 
tems use pseudorandom or PN spreading sequences to 
distribute the energy of the information signal over a 
wide bandwidth. The transmitted signal appears sim¬ 
ilar to random noise and is therefore difficult to detect 
by eavesdroppers. With a synchronized receiver, in¬ 
terferences can be suppressed by despreading. In ad¬ 
dition, by using orthogonal pseudorandom spreading 
sequences, multiple users may communicate simulta¬ 
neously on the same channel (CDMA). 


Spread spectrum techniques are suited for appli¬ 
cations in satellite communications (low power spec¬ 
tral density), mobile phones (privacy, high tolerance 
against multipath effects, multiple users), and military 
communications (low probability of intercept). 

2.1. PSEUDORANDOM VS. CHAOTIC 

Pseudorandom (PN) spreading sequences are widely 
used in spread spectrum communications because their 
statistics and orthogonality properties are well under¬ 
stood, they are easy to generate, and easy to synchro¬ 
nize. However, the inherent periodicity of a pseudo¬ 
random sequence compromises the overall security of a 
spread spectrum communication system. The greater 
the length of the pseudorandom sequence, the higher 
is the level of security, but the more difficult it is to 
establish synchronization at the demodulator. 

Figure 3 shows a block diagram of a spread spec¬ 
trum system using a PN spreader. The modulator 
spreads the data stream from the channel encoder, as 
determined by the spreading sequence, and transmits 
on a sinusoidal carrier using PSK or FSK, 


Data (n 



Data out 


Figure 3: Spread spectrum communication system us¬ 
ing a conventional PN spreader. 

A pseudorandom sequence generator is a special 
case of a chaotic system, the difference being that the 
chaotic system has an infinite number of (analog) states, 
while the pseudorandom generator has a finite number. 
A pseudorandom sequence is produced by visiting each 
state of the system once in a deterministic manner. 
With only a finite number of states to visit, the output 
sequence is necessarily periodic. By contrast, a chaotic 
generator can visit an infinite number of states in a 
deterministic manner and therefore produce an output 
sequence which never repeats itself. 

What are the advantages of using chaos in spread 
spectrum communication systems? With appropriate 
modulation and demodulation techniques, the “noise¬ 
like” spectral properties of chaotic electronic circuits 
[2] can be used to provide simultaneous spreading and 
modulation of a transmission. The simplicity of the 
analog circuits involved could permit extremely high 
speed, low power implementations. 
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3.2.1. CHAOTIC MASKING 


3. CHAOTIC MODULATION: STATE OF 
THE ART 

Exploratory studies of communicating with chaos have 
been carried out over the past five years [3]. To date, 
several techniques have been developed for generating 
chaotic signals, chaotic modulators and demodulators, 
and self-synchronizing chaotic receivers. 

In the remainder of this paper, we summarize the 
current state of knowledge in each of these domains 
and highlight the areas in which improvements are re¬ 
quired in order to realize the goal of a practical spread 
spectrum communication system exploiting chaos. 

3.1. GENERATION OF CHAOTIC 
SPREADING SIGNALS 

Two possibilities exist here: the baseband information 
signal may be spread at an intermediate frequency and 
up-converted using a conventional mixer and power 
amplifier, or the spreading may be accomplished di¬ 
rectly at the transmission frequency. 

Widely-studied circuits such as Chua’s oscillator [2] 
may be used as lowpass chaotic signal generators and 
their outputs mixed up to the RF transmission band. 
The principal disadvantage of this approach is that lin¬ 
ear wideband circuitry is required both in the mixer 
and power amplifier stages. 

The chaotic analog phase-locked loop (APLL) in¬ 
troduced by Kolumban [4] and shown in Fig. 4 offers a 
cost-effective means of directly generating a wideband 
RF spread spectrum signal at high power. 


Figure 4: Nonlinear baseband model of the chaotic 
APLL [4]. F{s) is a second-order LPF. 


3.2. CHAOTIC MODULATION AND 
SPREADING SCHEMES 

Over the past three years, five chaos-based modulation 
and spreading techniques have been developed: chaotic 
masking, inverse systems. Predictive Poincare Control 
(PPG) modulation. Chaos Shift Keying (CSK), and, 
most recently, differential CSK (DCSK). 

Each of the techniques described below has been 
demonstrated experimentally using discrete components 
[5] or dedicated circuitry. More recently, prototype 
spreaders have been realized as integrated circuits [6]. 


In chaotic masking [7], the information signal s(i) is 
spread by adding it to the output y{t) of a chaotic sys¬ 
tem. The resulting signal s(<) -f- y{t) is modulated and 
transmitted [3]. Provided that the information signal 
s{t) is small compared to a;(<), an identical chaotic sys¬ 
tem in the receiver can be made to synchronize with 
x{t). This permits the receiver to “filter” out the “dis¬ 
turbance” s{t). Thus, s{t) can be retrieved by simply 
subtracting the output of the receiver’s chaotic system 
from the received signal. 

Chaotic masking suffers from the disadvantage that 
distortion and noise introduced by the channel are in¬ 
distinguishable from the signal. 

3.2.2. INVERSE SYSTEMS 

In the inverse system approach [8], the transmitter con¬ 
sists of a chaotic system which is excited by the infor¬ 
mation signal s{i) and produces a chaotic output y{i). 
The receiver is an inverse system^ i.e. one which pro¬ 
duces s{t) = 5(<) as output when excited by y{i) and 
started from the same initial condition. If the system is 
properly designed, s(<) converges to s(i), regardless of 
the initial conditions. Inverse systems are widely used 
in digital encryption and spreading schemes. 

3.2.3. PREDICTIVE POINCARE CONTROL 
(PPC) MODULATION 

In Predictive Poincare Control (PPC) modulation, sym¬ 
bolic analysis of chaotic systems is used to encode and 
decode the information [9]. With an appropriate con¬ 
trol strategy, the transmitter is forced to follow a pre¬ 
scribed path, in a symbolic sense. On the receiver side, 
an identical chaotic system will synchronize approxi¬ 
mately with the transmitter system. By identifying 
the symbolic path in the synchronized receiver, the in¬ 
formation signal can be retrieved. 

Although each of these techniques has been demon¬ 
strated in the laboratory, none has been tested exper¬ 
imentally using a noisy communication channel. In¬ 
deed, recent simulations [10] suggest that masking and 
inverse systems perform poorly if the transmitted sig¬ 
nal is corrupted by noise. A more robust method is 
Chaos Shift Keying. 

3.2.4. CHAOS SHIFT KEYING (CSK) 

In binary Chaos Shift Keying [5]. an information signal 
is encoded by transmitting one chaotic signal for a “1” 
and another chaotic signal for a “0”. The two chaotic 
signals come from two different systems (or the same 
system with different parameters). 
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Two demodulation schemes are possible: coherent 
and noncoherent. The coherent receiver contains copies 
of the systems corresponding to “0” and “1”. Depend¬ 
ing on the transmitted signal, one of these will synchro¬ 
nize with the incoming signal and the other will desyn¬ 
chronize. By detecting synchronization at the receiver, 
one may determine which bit is being transmitted. 

In the case of non-coherent demodulation, no at¬ 
tempt is made to recover the carrier at the receiver; 
instead, one simply examines statistical attributes of 
the received signal, 

3.2.5. DIFFERENTIAL CSK (DCSK) 

Differential CSK [11] is a development of CSK which 
exhibits lower sensitivity to channel imperfections than 
any of the techniques outlined above. 

In DCSK, the modulator is a free-running chaotic 
generator with output x(t). Each binary symbol is en¬ 
coded for transmission as two bits, the first of which 
acts as a reference, the second carrying the informa¬ 
tion. To transmit a ^T”, and a one-bit-delayed 
copy of X are applied to the channel during successive 
bit periods. A “0” is indicated by transmitting a;(<) 
for the first bit period, and an inverted one-bit-delayed 
copy of this signal during the next bit period. 

The chaotic signal sent via the channel is correlated 
with the signal received during the previous bit interval 
and a decision is made based on the output of the cor¬ 
relator. Since both signals presented to the correlator 
have passed through the same channel, DCSK exhibits 
robustness in the presence of channel imperfections. 

4. DIGITAL DEMODULATION 

Digital demodulation refers to the process in the re¬ 
ceiver by which the transmitted digital information sig¬ 
nal is recovered from the incoming modulated wave. 
Either noncoherent or coherent techniques may be em¬ 
ployed in the demodulator; these typically involve cor¬ 
relation detection or statistical tests in each bit inter¬ 
val, followed by threshold-based decision-making. 

4.1. NONCOHERENT DEMODULATION 

Noncoherent demodulation techniques have been pro¬ 
posed for CSK [4, 12] using a transmitter comprised 
of two chaotic APLLs corresponding to symbols “1” 
and “0”. Statistics of the received signal (mean and 
standard deviation) are used in decision-making. 

In the case of DCSK using APLLs, a noncoherent 
demodulation scheme has been proposed in which the 
incoming signal is correlated with a delayed version of 


itself. The output of the correlator is positive (nega¬ 
tive) when a “1” (‘‘0”) is transmitted and tends to zero 
between adjacent bits [11]. Figure 5 shows simulations 
of this system with additive channel noise. Decisions 
are made at the peaks of the correlator output. 


VCO CONTROL VOLTAGE {Qo-94dB) 



(a) 

OUTPUT OF THE CORRELATOR (Qo-M dB) 



TIME (•] 

(b) 

Figure 5: DCSK using APLL chaos, (a) transmitted 
and noisy received signals; (b) output of correlator and 
received bit sequence (from [11]). 

4.2. COHERENT DEMODULATION BY 
CHAOS SYNCHRONIZATION 

CSK was originally described in terms of synchroniza¬ 
tion of chaotic subsystems in a receiver matched to 
those in the transmitter [5]. 

Indeed, it was the observation by Pecora and Car- 
roll at the Naval Research Laboratory in 1990 [13] that 
two chaotic systems could be synchronized without us¬ 
ing an external synchronizing signal which raised the 
possibility of self-synchronizing coherent demodulators 
for chaotic transmissions. 
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7. REFERENCES 


While this discovery has generated a great deal of 
interest in exploiting the properties of chaos for spread 
spectrum communications, recent studies of currently- 
known synchronization schemes [10] suggest that they 
are not sufficiently robust for use in noisy channels. 
Better synchronization methods must be developed if 
we are to realize our dream of an efficient chaotic spread 
spectrum communication system. 

5. ADDITIONAL CONSIDERATIONS 

In this work, we have adopted the position that se¬ 
curity is an add-on feature in a digital communica¬ 
tion system which may be implemented by adding en¬ 
cryption/decryption hardware at each end of the sys¬ 
tem, as shown in Fig. 1. Nevertheless, there exist 
strong similarities between the concept of discrete-time 
inverse systems and self-synchronizing stream ciphers 
[14] which may permit a hybrid approach to chaotic 
modulation and encryption. 

One advantage of using pseudorandom spreading 
sequences in a spread spectrum system is that multiple 
users are permitted simultaneous access to the chan¬ 
nel provided they use uncorrelated pseudorandom se¬ 
quences or codes. This is called code division multiple 
access (CDMA). Further work is required to define an 
equivalent concept of orthogonality for chaotic spread¬ 
ing signals. 

6. ENGINEERING CHALLENGES 

The field of “Communicating with chaos” presents many 
challenging problems at the basic, strategic, and ap¬ 
plied levels. 

The basic system level building blocks from which 
to construct a practical chaos-based spread spectrum 
communication system already exist: APLL chaos, CSK 
and DCSK, noncoherent and coherent demodulators. 
Nevertheless, further research and development is re¬ 
quired in all of these subsystems. 

We must characterize completely the dynamics of 
the APLL and develop design rules for constructing 
robust, reproducible chaotic transmitters. We must de¬ 
termine the performance of CSK and DCSK modula¬ 
tion schemes for noisy channels. We must analyze the 
statistical properties of CSK and DCSK transmissions 
in order to implement simple and robust receivers. 

Current proposals for CSK and DCSK receivers us¬ 
ing APLL chaos do not exploit the fact that the mod¬ 
ulated signal has been produced by a chaotic system. 
If we can exploit the underlying structure of the trans¬ 
mitted signal, by chaos synchronization or otherwise, 
improved receiver performance will result. 
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ABSTRACT 

In this paper we study the error bounds of constructive 
approximations of a given function with elements taken 
from a prescribed dictionary or subspace. 

The paper contributes to clarify some recent con¬ 
vergence results concerning constructive solutions with 
application in several approximation problems. 

1. INTRODUCTION 

Continuous functions on compact subsets of 71*' can be 
uniformly approximated by linear combinations of sig¬ 
moidal functions [1] [2]. The error in the approximation 
is related to the number of functions used (nodes in a 
neural network). A first convergence result was given 
by Maurey, as reported in [3]. Recently, some construc¬ 
tive solutions have been reported, where the iterations 
taking place involve computations in a reduced subset 

The problem of constructive approximation can be 
stated as follows; approximate a given element (func¬ 
tion) / in a Hilbert space H by means of an iterative se¬ 
quence /„, formed as a convex combination of elements 
taken from a subset of H,G. It has an enormous im¬ 
pact in establishing convergence results for projection 
pursuit algorithms [6], neural network training [5] and 
classification [7]. 

The paper is organized as follows: Section 2 will 
state the problem and analyze Barron’s [5] and Din- 
gankar’s [7] solutions. Section 3 will discuss a frame¬ 
work under which those solutions can be formulated. 
Section 4 will analyze the limits and bounds of the er¬ 
rors. Finally, Section 5 will close the paper with the 
conclusions. 

2. PRELIMINARIES AND 
CONSTRUCTIVE RESULTS 

Throughout this paper, G will be a subset of a real or 
complex Hilbert space H, with norm ||.||, The elements 

This work was supported by ISTEC and CICYT 


•of Gj are bounded in norm by some positive constant 
6. cb(G) will denote the convex closure of G (i,e. the 
closure of the convex hull of G in H). 

2.1, THE APPROXIMATION PROBLEM 

The first global bound result, attributed to Maurey, 
concerning the error in approximating an element of 
co{G) using convex combinations of n points in G, is 
the following: 

Lemma 2.1 Let f be an element ofco{G) and c a con¬ 
stant such that c > 6^ - ||/||2 = Then, for each 
positive integer n there is a point fn in the convex hull 
of some n points of G such that: 

ll/-/n|P<~ 

The first constructive proof of this lemma was given 
by Jones [4] and refined by Barron [5]; the proof in¬ 
cludes an algorithm to iterate the solution. The result 
can be stated as follows: 

Theorem 2.1 Let S be a constant such that S > 
Then, for each element f in co{G), we can construct 
an iterative sequence fn, fn chosen as a convex com¬ 
bination of the previous iterate fn-\ o^d a gn EG, 
/„ = (1 ~ A)/n-i + A^fn, such that: 

ii/-Ar<i 

The relation between this problem and the univer¬ 
sal approximation property of sigmoidal networks was 
clearly established in references [4] and [5]; specifically, 
it has been proven that, under certain mild restrictions, 
continuous functions on compact subsets of belong 
to the convex hull of the set of sigmoidal functions that 
one hidden layer neural networks can generate. More¬ 
over, since the proofs are constructive, an algorithm to 
achieve the theoretical bounds is also provided. 
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Other nonlinear approximation techniques have also 
benefited from the solution to this problem: approxi¬ 
mation by hinging hyperplanes [8], projection pursuit 
regression [9] and radial basis functions [10]. In all 
these related approximation problems the solution can 
always be constrained to fall in the closure of the convex 
hull of a subset of functions (e.g. hinged hyperplanes, 
ridge functions or radial basis functions in the examples 
mentioned above). 

2.2. CONSTRUCTIVE ALGORITHMS 

For the sake of clarity and completeness, we include 
here the proof of the main Theorem, following [5]. The 
proof needs to make use of the following Lemma. 

Lemma 2.2 Given f £ co{G), for each element of 
co{G), h, and A £ [0,1); 

inf \\f-{l-X)h-Xg\\^ < (l-Xf\\f-h\\^+X^b) (1) 

g^O 

The proof of the lemma will be carried out for / £ 
co(G’); it extends to elements in eo(G) because of the 
continuity of all the terms involved in the inequalities 
[ 11 ]- 

Since / £ co(G), there exists a convex combination of 
elements g* from G, so that / = Ylkzzi ^kgl- Let^5 
be a random vector taking values on H so that P(g* = 

g'k) = “fc- 

Then E(y*) =/, and 

■ var{g‘) = E(||i7* - /|P) 

= E{\\g-f)-\\f\\'^<b} 


Now, for A £ [0,1] and d € H, 

£^(llA(</*-/) + (i-^)rf|l') = 

= A2E(lIi7--/ir)-l-(l-A)2||c/|P 

< A26}-1-(1-A)='||d|p 

Then, 

inf 11/ - (1 - A)/» - Ai(|l^ 

9^0 

< E(||(l-A)/i-bA</*-/l|)' 

< £(||(l-A)(/i-/) + A(<;*-/)||)^ 

< {l-Xm-h\\^ + XH] 

which concludes the proof of Lemma 2.2. 

We can now prove the main result, using an inductive 
argument. 


At step 1, find gi and €\ so that 

||/-pi|P<infll/-^ll" + ^i<^ 

This is guaranteed by (1), for A = 1. 

Let now /„ be our iterative sequence of elements in 
co{G), and assume that for n > 2, 

It is then possible to choose among different values of 
A and so that: 

(1 - A)2ll/„-i -/!!'“ + xH} < ” (2) 

At step n, select gn such that \\f-(l-X)fn-i-Xgn\\^ < 
inf 11/— (1 — A)/n-i — AylP-1-Cfi (3) 

g&G 

Hence, using (1), (3) and (2), we get: ||/— /n|| < “> 
and that completes the proof of Theorem 2.1. 


2.3. DISCUSSION 


The values of A and £„ in [5] and [7] are related to the 
parameter a,a = S/b) - 1, in the following way: 


[5] 

[7] 


ll/-/r.-ilP . 




0(6 

" n(n + o) 


It is easy to show that admissible values of A which 
satisfy inequality (2) for positive values of £„ fall in the 
following interval, centered at Barron’s optimal value 
for A: 


ll/-/n-l||^ , 

6}-t-||/-/n-llP 


b} + \\f 




To evaluate the possible choices for the bound £„ we 
need to make use of the induction hypothesis; introduc¬ 
ing it in inequality (2), values of A should now satisfy 

Then, admissible values of A for positive values of 
£„ fall in the interval: 
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1 + o ^ n - 1 
n + a n + a 

The fact that 1 /n falls within the limits of this inter¬ 
val (as can easily be checked) explains why the average 
of n elements [7] is always a solution to the problem. 

3. OPTIMAL PARAMETERS 

We now formulate the following questions: 

1. What is the minimum bound for the global error 
using convex combinations of n elements from G? 

2. What is the optimal choice of A for a given bound, 
so that the tolerance allowed for £n is maximum? 

Based on the assumptions made and in Lemma 2.2, 
let us formulate the problem again in a more general 
wayi Our objective is to look for a constructive approx- 
imation so that the overall error using n elements from 
G satisfies the following inequality: 

6(n) being a function of the parameter n which indi¬ 
cates the order of the approximation (i.e. b{n) = n 
both in [4] and [7]) and S the parameter related to 
as defined before. 

In what follows we will assume that the iterate /„ 
will be chosen as a convex combination of the previous 
iterate /„_i and a point in G, g„; this introduces a loss 
of generality, since other constructive approaches could 
be devised in order to re-optimize the coefficients of 
previous elements from G at each step. The facts that 
f„ is forced to be a convex combination of n elements 
from G, and our algorithm has to be constructive, mean 

that f„ is in the convex hull of {gi,g2 . 9n} and /„_i 

is in the convex hull of { 51 , 52 . • • • .l/n-i}. but that does 
not imply that /„ must be a convex combination of 
/„_i and 5n, as can be easily shown. We leave the 
more general problem for further investigation and con¬ 
centrate here on the case where constructiveness of the 
algorithm is taken as in [4] and [7] to be equivalent to 
the constraint that, at each step, /„ is in the convex 
hull of {/n-l,5n}- 

To answer the questions posed at the beginning of 
this section, let us now set up a framework where con¬ 
structive results can be formulated. 

Let /„ = (!- A)/„_i -f A 5 „; we want to find A, £„, 
and the function 6(n) so that: 


a(l + Of) 
n(n — 1) 


< inf (l-A)'*||/-/„_,|p-t-A26?-b£„ 

~ 0<A<1 

< inf {l-A)^Tr-—+ (5 

” o<A<i 6(n—1) ^ 

S 

- b(n) 


Since J = (1 -f- a)bjf we can rewrite the last inequality 
in the following way: 


inf (1-A) - zr 


-f A^<5 + — X^abj 



This last expression represents the trade-off between 
the global error we are trying to achieve S/b{n) and 
the error at each of the subproblems, 

We are going to prove the following: if we set £„ = 
X^abjy then, for a given A, the best rate of convergence 
of the approximation which can be achieved, measured 
in b{n)y is the one given in [7] and [5], and the optimal 
value of A which minimizes €n for that best rate of 
convergence is precisely the value given in [7]. To see 
that, let's introduce the value of c„ in (5); then: 


(1 - u ^ iV + ^ ITT 

' ' 6(n - 1) b{n) 

Hence, 


and then: 


P(A) = A" (1 + b(n - 1)) - 2A -t-1 - < 0 (6) 

P{X) has to have a discriminant greater or equal than 
0 for the inequality (6) to hold. So, 

and then, finally: 


6(n) > (1-f 6(n - 1)) (6(n) - 6(n - 1)) <=> 
6(n-l) > 6(n - 1) (6(n) - 6(n - 1)) <<=> 

6(n) < 1 H“ b{n — 1) (7) 

Inequality (7) proves that, under the assumption that 
= X^abjy there is no better rate of convergence using 
these kind of convex constructive solutions that the one 
obtained in references [4] and [7], since the maximum 
rate is obtained when 

6(n) = 1 -h 6(n - 1) 5(n) = 6(1) -|- n — 1 = n (8) 
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Furthermore, for this rate of convergence there is only 
one zero of the function F(A), namely, A = ( 1 /n) which 
is the optimal value and coincides with the one pro¬ 
vided by Dingankar’s algorithm. 

We will next answer the questions posed at the 
beginning of this section, concerning the limits and 
bounds of the approximation. 


4, BOUNDS FOR THE ERRORS 


Looking back at expression (5), we will notice that, 
after using Lemma 2.2, we have at each step a quadratic 
problem in A, which consists of minimizing 

Q(A„) = (1-A„)'^^^+A2 6J 

provided that the induction hypothesis (4) is satisfied 
for k < n. We have introduced the notation An to 
stress the variation of this parameter along the iterative 
process. 

Taking derivatives, we get 

A„ 6 y = (1 - _ 1 ) ^ 


X _ ^ + “ _ (9) 

"“ 6 j 6 (n-l) 1 -f o 6 (n - 1 ) 

Hence, we get the following expression of the opti¬ 
mal error bound; 


ll/-/n|P< 

2 S (1 I 

- [ 6 (n-l) 

/ J 62 (n- 1 ) \ ^ l + a + b(n-l) \ , ^ 

Vl-bQ + fc(n-l)j V, b^{n-\) ) 

_ ^ nQ) 

1 + a -f ”1) ” 

From (10) we can write the following expression for 
6 (n) and Cn’ 


1 


1 


_ _ _ — 

b{n) 1 + a 4 - 6 (n — 1 ) S 


( 11 ) 


and then 

_ S[\-\-a-{-b(n-l) - b{n)] 

b{n){l + a + b{Ti - l)) 


( 12 ) 


From this last expression we conclude that there is 
a fundamental limitation in the rate of convergence 
that can be achieved under the hypothesis made so far, 
namely: 

S 

6 (n) - 6 (n - 1 ) < 1 -f a = p- 


Assuming that we can solve the partial approxima¬ 
tion problems at each step of the iteration, so £n — 
0 ,n > 1 , then 


b{n) = l + Q + b{n-l) => 6 {n) = n(l- 1 -a) 

(13) 


provided that we make 6 ( 1 ) = 1 + ar, which means that 
we should find an element <71 in G so that 

ii/-/.r<4; 

which is guaranteed by Lemma 2.2. Hence, the best 
rate of convergence that can be obtained follows the 
law c/n, since 

S _ ^ 
n(l+a;) n 


We have then reached the minimum value of the con¬ 
stant c, namely c — bj. 

Note that for this minimum to be reached we have 

1-1-a _ 1 

~ (1 -f- a)n n 

so the optimal convex combination would be the aver¬ 
age of n elements from G, as in [7]. 


The remaining problem, namely; given the optimal 
value of A find the maximum «„ for a fixed convergence 
rate, thus making the quasi-optimization problem at 
each step easier to solve, was already explicitly solved 
in ( 12 ). 

Again, to show how our results compare with [5] and 
[ 7 ], we will assume that our desired rate of convergence 
is given by 6 (n) = n. 


The value A = (1 + a)/(n + or) solves the optimization 
problem, and: 

e = (14) 

" n{n + or) 

This is the best upper bound we can achieve for the 
partial error at each step of the iteration process. It 
coincides with Barron^s bound, and is always greater 
than the bound found in [7]. To illustrate this fact, 
Figure 1 plots the bound €„ for n = 5 , for = 1 as a 
function of a. Optimal bound, solid line, Dingankar’s 
[ 7 ] bound dotted line, 
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5. CONCLUSIONS 

We have studied in this paper a theoretical framework 
where constructive algorithms based on convex com¬ 
binations of elements from a subset of a Hilbert space 
can be formulated. We have derived the optimal values 
for the coefficients in the convex expansions to guaran¬ 
tee a desired convergence rate. We have also studied 
the trade-off between global and partial errors for that 
optimal value. 
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ABSTRACT 

We have developed a Hopfield-based neural net¬ 
work for the shortest path problem and applied it to a 
competitive strategy for admission of calls and routing 
of permanent virtual circuits in ATM networks. The 
speed of this neural network is increased in an order 
of magnitude respect to previous approaches, and its 
performance in this competitive strategy is equivalent 
to the performance of Dijkstra’s optimum algorithm. 

1. INTRODUCTION 

The design of the Broadband ISDN requires effec¬ 
tive procedures of call adtriission and routing to pro¬ 
vide a predefined quality of service without much un¬ 
derstanding of the traffic characteristics. The Compete 
itive Strategies provide a framework to address these 
problems [1]. 

These strategies can achieve provably good perfor¬ 
mance without the knowledge of the traffic patterns. 
Their structure can be resumed as follows: 

• The formulation of a cost for every arc in the com¬ 
munication network. This cost is exponential in 
the current congestion of the arc, which leads to 
an approximation to the optimal multicommod¬ 
ity flow solution in polynomial time; 

• The calculation of the Shortest Path Problem (SPP), 
as presented below, with these costs; 

• If the minimum path for the SPP has a cost less 
than a threshold (to be defined in Sec. 3), the 
call is accepted and routed through such a path. 

The SPP can be shortly formulated as follows [6]. 
Let G(N,L,C) be a directed graph, where N is a set of 
nodes, L is a set of ordered pairs of nodes in N (links). 


and C a cost for links. Undirected graphs can be con¬ 
sidered if for every link (na,nb) in L, the reverse link 
(nb,na) also belongs to L. We define the cost of a path 
in G as the sum of the costs of the links in such a path. 
The SPP for a pair of nodes (nl,n2) is to find the path 
from 111 to n2 with the lowest cost. 

Although there exist very efficient algorithms to 
solve exactly the SPP, like Dijkstra’s or Fulkerson^s 
methods [6], it is interesting to consider new approaches 
with a direct and fast hardware implementation as neu¬ 
ral networks, which have already been applied to this 
problem [3]. 

The Hopfield model and its applications to opti¬ 
mization problems are finely described in texts as [7]. 
It consists of a set of nonlinear procesors (often a sig¬ 
moid), called neurons, connected each other with (usu- 
ally) symmetric weights. These weights make up a Lya¬ 
punov function for the system; this function describes 
its dynamics towards a set of attractor points, which 
represents the set of solutions for the combinatorial 
problem. 

Two different approaches based on the Hopfield net 
have been proposed in the literature for the SPP [3]: 
one of them formulates the problem with a quadratic 
cost function for the neural network; the other, com¬ 
ing from a remarkable paper by Ali [2], uses a linear 
cost function. Some advantages of the linear formu¬ 
lation are reported in [2], essentially the reduction of 
complexity (measured as the number of neurons in the 
neural net), an increase in the obtained performance, 
better scalability, and no assumptions about the length 
of the path restrict the optimality of the solution. 

The proposed neural network has also a linear cost 
formulation inheriting the advantages mentioned be¬ 
fore. A further reduction in complexity is accomplished 
by limiting the neurons in the neural net to those rep- 
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resenting links actually present in the communication 
network: then, for a given connectivity (average num¬ 
ber of links for every node) in the communication net¬ 
work, the number of nodes in the neural net grows lin¬ 
early with the size of the communication net, instead 
of the quadratic growth in Ali’s neural net. The speed 
is also increased respect to Ali’s net about an order of 
magnitude. 

2. A HOPFIELD-BASED NEURAL 
NETWORK FOR THE SPP 

The proposed neural network, following [2], is a 
matrix of n x n neurons, where element Vij represents 
the link from node i to node j; after convergence, it can 
take two logical values, ’0’ and ’1’. In principle, we 
can assume that there exists a correspondence between 
neuron values 0 and 1 and logical values ’0’ and 1 , 
respectively. After convergence, the selected path is 
defined as the set of links with their neurons in logical 
state ’1’. 

In order to separate the validity of solutions from 
the optimization of the linear cost, we follow [4, 8]: 
the neural network equations of motion are designed 
to provide valid solutions, while the linear cost opti¬ 
mization is achieved by setting the initialization point 
to an appropriate value. 

The conditions to get valid solutions can be sum¬ 
marized as follows: 

• All neurons representing non-existing links must 
converge to state ’O’; 

• There has to be one and only one link starting at 
the source (S) node of a path, and one and only 
one link ending at the sink (T) node of such a 
path; 

• For every node of a path different from source 
and sink, there must be one and only one link 
arriving at this node and one and only one link 
leaving it; 

• No separated loops can exist. 

In order to prevent the appearance of non-existing 
links, links arriving at S, or links leaving T, we define 
the Inhibition Matrix M: for i from 1 to n, M[i,S] = 
M[T, i] = 0; and if L[iJ] = 0, M[iJ] = 0. 

The equations of motion for the Hopfied network 

are 

: dVij = A dtVij M[iJ] dt (1) 


where the diVij is different for neurons in Source row 
and Sink column than for the rest of the neurons in the 
network, being given by 
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the rest of the terms being defined as follows 


EGl = 53(Kp-Fp.) 

p 

(5) 

EG2 = 

p 

(6) 

EG3 = + 
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Terms EGl and EG2 try to impose that the num¬ 
ber of incoming links to a node be equal to the num¬ 
ber of outgoing links from that node; these terms are 
the same as in [2] for the neurons outside Source row 
or Sink column, but for these lines this constraint is 
changed, to take into account that it does not apply to 
Souce and Sink nodes. Term EGS tries to avoid more 
than one ’1’ in a row or in a column; it prevents more 
than one outgoing or more than one incoming link in 
a node; then, it avoids loops in nodes belonging to the 
selected path. Note that in [2] the avoidance of loops 
were left to the linear minimization of the total cost, 
being possible if the costs for them are zero or very 
low. Parameter C forces one ’1’ in the souce row and 
another ’1’ in the sink column. Finally, parameter D 
preserves the network from spurious loops disconnected 
from the selected path. 

We have selected the parameters A, 5, C, and D to 
obtain valid solutions to the routing problem imposing 
the valid routes to be stable attractors of the Hopfield 
dynamical system, which is achieved by setting for ev¬ 
ery valid route: 

• > 0, for every neuron Vij that has to be in 
state ’0’ to represent a valid route; 

• dtVij < 0, for every neuron Vij that hcis to be in 
state ’1’ to represent a valid route. 

These relations lead immediately to the constrains 
for parameters that follow: 

• 0 < C < 1 

• 0 < 5+ D 
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• 0 < i? < 0 

In order to make the restrictions for parameter D 
consistent with its role of suppressing spurious loops 
(D ^ 0), it is necessary to redefine the logical states; 
a neuron V is in state ’1’ if K > thl, and it is in state 
’0’ if K < th0\ where thO and ihl are thresholds which 
satisfy 0 < ihO < thl < kmin, k^in being given in 
Fig. 1 versus D for paths with a maximum number of 
hops, n, from 3 to 20. The value of these thresholds 
will also determine the speed of the network and the 
probability of retrieving valid solutions. For a further 
discussion about this topic see [5]. 



Figure 1; Minimum values to represent logical state 


As the optimization problem is linear, the selection 
of the initialization point for the neural net can be used 
as a good heuristic to perform the optimization [8]. 
This approach allows to separate the problem of opti¬ 
mization from that of obtaining valid and fast solutions 
without aliasing (i.e., linear superposition of terms for 
optimization and constraints) in the Lyapunov func¬ 
tion formulation. Assuming that costs, Uij, lie in [0,1], 
the initialization point is selected to be; 

Vij = {l-aij)M[i,j]r] (8) 

where the multiplication by the inhibition matrix makes 
that neurons representing non valid arcs converge triv¬ 
ially to 0; and rj is a penalty for long paths. 

The neural algorithm described so far allows a re¬ 
duction in complexity and an increase of speed about 
an order of magnitude. The complexity of this net 
grows, for a given connectivity, as the number of links 
in the network and not as the square of it. However, 
the percentage of optimum paths obtained with this 
net decreases to about the 84% in the experiments. 


In order to obtain better performance, a bank of 
nets is used; the initialization vectors for the 

bank of nets are given by 

= (1 - aij) M[i, j]Ti + i'nij[k] (9) 

where 1 / = 0.1 (except one of them for which = 0) 
and n,j[fc] is a random number in [0,1].- 

The total number of neurons in the bank of nets is 
selected to be equal to the number of neurons in Ali’s 
net, so that the complexity is comparable; the per¬ 
centage of optimum solutions grows to 97%, the speed 
of the bank is essentially the speed of one neural net 
(an order of magnitude bigger than Ali’s net), and the 
robustness obviously grows due to the parallelism. 


3. THE COMPETITIVE STRATEGY 


We take a simplified model of a competitive strat¬ 
egy for Permanent Virtual Circuit (PVC) routing from 
Plotkin [Ij. The objective of this strategy is maxi¬ 
mizing the benefit of the network, which is done by 
selecting the most profitable set among the incoming 
requests. As any other competitive strategy, a cost for 
every link is defined, which is exponential in the current 
load of the link; once a request is received, the mini¬ 
mum cost path is calculated for it. The call is admitted 
if the cost of the optimum route is below a threshold, 
which depends on the profit provided by this call, i.e., 
if the trade-off between the cost of network resouces 
and the profit provided by the call is positive for the 
network. 

The communication network is represented by a 
graph G(V,E,u), where V is the set of nodes, E the set 
of (directed) edges, and u a function from E to the set 
of positive real numbers which represents the capacity 
of the edges. 

The traffic in the network is represented by a se¬ 
quence of requests l3i, where each request Pi is de¬ 
scribed by Pi — {si,ti,ri,Pi): nodes s; and <,• are the 
source and the destination for the request i, is the 
bandwith required by the request, and p,- is the profit 
got if the request is accepted. As we consider only 
permanent circuits, the holding time is infinite; so, no 
timing parameters of the original model are needed ex¬ 
cept the order index i. 

The relative load on the edge e, just before consid¬ 
ering the jth request, is defined by 



E 

eePi,i<j 



( 10 ) 


where Pi is the path for the ith request; the load on the 
network, just before considering the jth request, results 
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Let Hmax be the maximum number of hops allowed 
for a path in the communication network, and let n = 
2 to establish a feasible competitive method, 

we need to impose some constraints: 

• 1 < ^ = Hmax 

. m.n.6Eu(e) 

— /Ofif/i 

Then, the competitive strategy for throughput max¬ 
imization can be formulated as follows: 

1. For the ith arriving call to be admitted, it must 
be checked if there exists a path P from S| to 1% 
satisfying the condition 

- 1 ) < ( 12 ) 

2. If such a path exists, accept the call and route this 
call on path P satisfying the above condition. 

4. SIMULATIONS 


Results of simulating the competitive strategy of 
Section 3 with Dijkstra's optimum algorithm [6] and 
with Hopfield's algorithm of Section 2 for the SPP com¬ 
putation are presented here: we will refer to both algo¬ 
rithms as OPT and HOP, respectively. The objective is 
to compare, after saturation of the hetwork(the simula¬ 
tions have been stopped after a consecutive rejection of 
100 requests by both algorithms), the obtained profit 
by both algorithms versus the required resources. Some 
conclusions are then drawn. 

The simulation characteristics are: the communica¬ 
tion network has 13 nodes, and a connectivity of 2.62; 
the capacity of links is normalized to 1; Hmax has 
been set to 10; parameter r,- is given a random value in 
[0,0.2], while the source and the destination nodes have 
been randomly chosen among the nodes in the network. 

For the Hopfield algorithm: A = 1, = 0.9, C = 

0.7, V = 0.05; dt = 0.1; ihO = 0.1, thl = 0.55. Note 
from Fig. 1 that the values selected for thl and D 
force the maximum number of in a path to be n = 7, 
although this number was never reached in the exper¬ 
iments due to its associated high cost. Since the com¬ 
petitive strategy costs were not constrained to [0,1], 
they have been mapped to the costs a,j, considered in 
Ecs. (8) and (9), as 


max — min 


cost[iJ] - 


max — mm 


(13) 


where cost[iJ] is the cost given to the link L[ij] and 
max and min are the maximum and minimum of those 
costs, respectively. 

We select the number of nets in the bank to have 
roughly the same number of neurons (170) than Ali^s 
net (169), so the complexity is comparable. The num¬ 
ber of optimum solutions obtained with the net is about 
97%. It would be interesting to study the behaviour 
with the nurnber of nets in the bank to obtain a fixed 
performance in optimization, to see how the total com¬ 
plexity (measured as the number of neurons in the 
bank) grows. Nonetheless, it is worth to note that 
the evolution of the neurons in different nets are not 
related each other, so it could be expected a better be¬ 
haviour increasing the number of nets in the bank than 
increasing the number of neurons in Ali’s net. 

Fig. 2 makes a global comparison between these 
algorithms, showing the relative difference of profit be¬ 
tween HOP and OPT versus the resources used to al¬ 
locate the calls (load). These results are averages over 
10 realizations. The average number of iterations for 
our Hopfield net to reach convergence is 240 against 
the 3000-5000 iterations needed in [2]. 

Let us discuss briefly these results. The inhibitory 
matrix reduces the number of active neurons in the 
Hopfield net to the number of links in the communica¬ 
tion network, so that, if the connectivity of the network 
is fixed, the complexity of the Hopfield net grows as 
0(n), where n is the number of nodes. Obviously, the 
use of a bank of nets obviously multiplies the number 
of neurons by the number of nets in such a bank, but 
further research is needed to determine how this last 
number grows with n for a fixed performance. Non- 
theless, the use of a bank increases the optimization 
performance without affecting the speed of the whole 
system, and it provides additional robustness. We re¬ 
peat that the number of iterations of the neural net¬ 
work to get convergence is an average of 240 iterations 
versus 3000-5000 iterations in [2]: about 10% of such a 
number in the previous work of Ali [2]. 


5. CONCLUSIONS AND FURTHER 
RESEARCH 

We have developed a Hopfield net for the SPP 
that, compared to existing neural nets for the SPP, 
provides good scalability, a reduced complexity, and 
higher speed; but worse quality. If a bank of nets with 
a complexity comparable to Ali’s neural net is consid- 
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Relative difterence (%) in profit: HOP - OPT 



Figure 2: Relative performance of the neural net versus 
Dijkstra’s algorithm. 
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ered, the obtained quality is equivalent in the compet¬ 
itive strategies to Dijkstra’s optimum algorithm, and 
the speed remains to be about an order of magnitud 
better than Ali*s, 

It would be interesting a further understanding of 
the dynamics of the net as well as about its ability 
to provide optimal solutions, this issue being closely 
related to the capacity of the Hopfield net. The devel¬ 
opment of distributed Hopfield algorithms for routing 
is also of interest, as the current research in networking 
seems to suggest. 
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ABSTRACT 

The canonical piecewise linear structure is used for non¬ 
linear filtering and its nonlinear approximation capac¬ 
ity is shown by utilizing piecewise linear partions. It is 
also shown that CPL network can approximate a given 
nonlinear continuous function with any degree of accu¬ 
racy. This result is extended to show that CPL net¬ 
works can be used to equalize a nonlinear channel. We 
show that if the distribution of the output of equalizer 
is the same as the distribution of the sequence at the 
input of the nonlinear channel, then the global system 
is identity (except for a sign factor) under some reg¬ 
ularity conditions. Thus, distribution learning [1] by 
maximizing an appropriate objective function achieves 
nonlinear channel blind equalization. 

1. INTRODUCTION 

The traditional theory of signal processing is estab¬ 
lished upon the assumption of linearity. The linear 
filter with or without feedback whose output is a linear 
combination of the input signal is both easy to imple¬ 
ment and to analyze, and has been widely used in a va¬ 
riety of applications. However, most practical systems 
are better approximated by nonlinear models. Also, 
the growing demand for signal processing in more de¬ 
manding environments to achieve very high data rates 
has driven the need to improve on existing methods 
using nonlinear filters. Even simple nonlinear models 
can successfully capture types of system behavior which 
would not be possible to describe with linear models. 
However, nonlinear models also offer great challenges 
because of their complex structure and dynamics. The 
statistical problems of model identifications are simi¬ 
larly more intricate and the introduction of nonlinear¬ 
ity into the filter’s operation will lead to an increase in 


the implementation complexity. 

Several approaches have been proposed for the es¬ 
timation and characterization of the nonlinear model. 
Polynomial or Volterra filters [13] assume that the non¬ 
linear function can be represented as an expansion of 
polynomial terms, and are probably the best known 
method of nonlinear filtering. It turns out that al¬ 
though sufficiently high order polynomials can yield a 
small asymptotic probability of error, they will also, in 
general, converge very slowly. Neural network struc¬ 
tures, such as multilayer perceptron and the radial ba¬ 
sis functions, have also been introduced as nonlinear 
filters (e.g. [6]), but they require a large amount of 
training time and large network sizes. Recurrent neu¬ 
ral network (RNN) equalizer [10] can accurately model 
the inverse of a nonlinear communication channel with 
smaller network size, but because of its complex non¬ 
linear structure, its dynamics are very hard to explain. 
Also, for blind equalization, it is quite difficult to incor¬ 
porate statistics information into the network structure 
because of the highly nonlinear structure of a general 
RNN. 

Piecewise linear models constitute a comprise be¬ 
tween the complexity of the nonlinear approximation 
and the theoretical abundance of the linear domain. 
Piecewise linear models have been proven very useful in 
control engineering [14], nonlinear circuit analysis, and 
other nonlinear problems [3]. Here, we study the ap¬ 
proximation and dynamical properties of a special kind 
of piecewise linear function, canonical piecewise linear. 
(CPL) network [3], present its application to nonlinear 
filtering, particularly for blind equalization of nonlinear 
channels. CPL network offers the following benefits: 
(1) it makes use of standard linear adaptive filtering 
techniques to perform training tasks and allows for ef¬ 
ficient selection of the partition boundaries; (2) it offers 
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savings in computation time and implementation cost, 
especially when required to model strong nonlineari¬ 
ties; (3) because of its piecewise linear nature, it allo\\^‘ 
for easy incorporation of known statistical information 
into the network structure. 

In this paper, we first prove the nonlinear approxi¬ 
mation capacity of CPL network by utilizing piecewise 
linear partions and show that CPL network can ap¬ 
proximate a given nonlinear continuous function with 
any degree of accuracy, and explain dynamics of learn¬ 
ing on the CPL network. We then extend this result 
to show that CPL networks can be used to equalize a 
nonlinear channel, and present a proof of this ability 
of the CPL equalizer for a general nonlinear channel. 
The theoretical results reported for this property have: 
always assumed a linear distorting channel character¬ 
istics [2],[4]. We show that if the distribution of the 
output random variable of equalizer is the same as the 
the distribution of the sequence at the input of the 
nonlinear channel, then the global system is identity 
(except for a sign factor) under some regularity condi¬ 
tions. Thus, distribution learning [1] by maximizing an 
appropriate objective function achieves nonlinear chan¬ 
nel blind equalization. 

2. REPRESENTATION AND CAPACITY 
OF CPL NETWORK 

The CPL network is defined as [3]: 

Definition 1 {Canonical Piecewise-Linear Function): 
A piecewise linear function f:D—^Q with a compact 
subset D C and compact subset Q C is called 
a canonical piecewise linear (CPL) function if it can be 
expressed by a global representation 

r 

/(x) =a+Bx + ^Ci|{ai,x)+; 0 ,| ( 1 ) 

» = 1 

where B G R^^^ , a, c.-, a,- G R^ and pi G R, 

CPL representation only requires a minimal amount 
of memory space for storing the parameters of multi¬ 
dimensional piecewise linear functions. Since the do¬ 
mains of the functions describing such models are par¬ 
titioned into polyhedral regions where the functions are 
linear throughout, all the nonlinearities are localized in 
the region boundaries. This makes them much more 
amenable for analysis than virtually any other type of 
nonlinear functions. Moreover, the class of CPL func¬ 
tion is closed since the composition and inverse (if it 
exists) of CPL functions is again a canonical piece- 
wise linear function [9]. Thus, CPL function provides 
us with minimum number of boundaries to partition 


the training patterns, and in each partitioned region, 
we can use a linear model to approximate the given 
mapping. While the representation capability of (1) 
has been studied in [3], this network .was proposed and 
used without proof of its approximation capability. In 
[11], CPL network is considered as a special case of 
the multilayer perceptron model, hence claiming its 
approximation ability, however no attempt is made to 
.construct,a proof to demonstrate the claim and the 
connection with the multilayer perceptron model. 

Before discussing the capacity of CPL network, we 
introduce following definitions and lemmas: 

Definition 2 {Nondegeneraie Partition) partition 
{af,x)= 0 f=l,2, 

is said to be nondegenerate if for every set of linearly 
dependent vectors a,* 3 , • • •, a,-^, rn <q, the rank 
of the matrix [a.-^, > * * • j strictly less than the 

rank of (// -f-1) x m matrix: 

^ii i 0^*3 j • • • j 

. Pil J 012 ) • ’ • » Pirn 

Definition 3 Consistent Variation [3]: A function /: 
D —* Q with a compact subset D C R^ and compact 
subset Q C R^^ is said to possess the consistent varia¬ 
tion property if and only if • 

1) / has a linear partition, 

2) - Jiio- = ... = - 

= ••• = where 

and denote the Jacobian matrices of the regions 

Rij'^ and Rij~ respectively, which are separated by the 
boundary 

{ai,x)-f-/?,-= 0 

Here, j = 1,2,..., and q pairs of regions are sepa¬ 
rated by this boundary such that {a,-,x} + Pi > 0 for 
X G Rij^ and (a,-, x) + Pi <0 for x G Rij^. 

Lemma 1 [3]: If the domain space of a continuous 
piecewise linear function / is partitioned by a set of 
nondegenerate linear partition boundaries, then / has 
the consistent variation property. 

Lemma 2 {Necessary and Sufficient Condition) [2]: A 
piecewise linear function / has a canonical piecewise 
linear representation if and only if it possesses the con¬ 
sistent variation property. 

We prove the following theorem based on the non¬ 
degeneration partitions property: 
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Theorem 1: Let domain D be a compact space of 
N dimensions and ^ be a set of canonical piecewise 
linear functions on D. Then, for any continuous func¬ 
tion / on D, there exists a function f £ ^ such that 
|/(x) - /(x)| < e for all X e 

Proof. Since / is continuous on D, then for every 
^ £ D, there is a sequence {/«{} of closed intervals 
containing whose lengths converge to zero, such that 

ll/(x)-/(OII<f (2) 

Then, the set of intervals 

I = [In(\^ £ D,n= 1,2, ■■■] 

covers D in the Vitali sense [7]. Hence, by Vitali Cov¬ 
ering Theorem [7], there are a finite number of disjoint 
intervals /mCiifnafsi• • •>/ covering all of D, 
and, in each interval ImU condition in (2) always holds 
for i = 1,2,. On the other hand, because of 
the continuity of linear function, there exist Jacobian 
matrices J„((( and vectors vfmU such that 

-/(^<)|| < I x£D, i = l,2,---,k 

and they have the same value at the common boundary. 
Define /(x) as follows: 

/(x) = Jn.f.X + yfni(i X £ /„ifi 

then, from (2) and (3), we have 

||/(x)-/(x)||<c xeD 

Since /«;{(,* = l,2,---,k are closed intervals in D, 
they can be obtained by partitioned D with a set of 
liondegenerate partitions. Thus, by Theorem 2.1, the 
piecewise linear function /(x) possesses a CPL repre¬ 
sentation. 

Hence, any nonlinear channel can be represented as 
a CPL function, and furthermore, if we use a CPL net¬ 
work as an equalizer, then, the global system is still a 
CPL function. 


3. BLIND EQUALIZATION BY CPL 
NETWORK 

Blind equalizers are a special cleiss of equalizers that de¬ 
termine their parameters based on the statistics of the 
zhannel input and the measured output when training 
sequences are not accessible. Since almost all of blind 
jqualizers such as [5] are developed for linear channels. 


the use of these algorithms will suffer from a severe per¬ 
formance degradation for unknown nonlinear channels. 
A RNN-based blind equalizer is proposed in [10], how¬ 
ever, the equalizer does not guarantee the consistency 
of the cost function utilized. In this work, we estab¬ 
lish the consistency of the estimation scheme by using 
a CPL equalizer for nonlinear channels as follows: 

Let the global system T = .^(5) (a cascade of a 
nonlinear channel S and the CPL equalizer :F) be the 
CPL network (1) and {x(n)} be an i.i.d. random vari¬ 
able with distribution v, a:(n) £ D- Assume that CPL 
network (1) divides the input space into m disjoint re¬ 
gions, and in each region Ri, (1) is 

equivalent to the following linear model: 

Mi-. x{n) = '^WijXj{n) (4) 

The basic assumptions on T and u are the following. 


(i) The distribution v is symmetric with finite vari¬ 
ance. 

(ii) For each model Mi, there exists at least a subset 
(®) / C D and a region R,, I = I x I • • - x I C Rj such that 

the mapping 

x(n) = '^WijXj(n) 

is a / onto I mapping. 


Then, we have the following conclusion: 

Theorem 2: Consider the global system T = T'{S). 
We assume that {a:(n)} is an i.i.d. process with dis¬ 
tribution u and assumptions (i) and (ii) are satisfied. 
If the distribution of {i(n)} is still t/, then, the global 
system T is identity except for a possible delay and a 
sign factor. 


Proof. Since {i:(n)) and {:£(«)} have the same distri¬ 
bution and v is symmetric, then, £'{a:(n)} = {^(n)} = 
0 and 


= JJ " J- x^duxiduxt' ■ - dux 


where T = -I, T = (-/) x (-/) • • • x (-7). Then, 
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here, p = Jj dv^. We can easily obtain that /?* ^ = 

1. Let / be the characterization function of a;(n) on 7, 
we have [8] 

f{T) = '[[fiWijT) (5) 

Let g = I/I and G be the distribution function corre¬ 
sponding to g. From (5), 

Setting rpir) = — lnj(r)/r^, then, 


the estimated probability mass function (pmf) of the 
input sequence a;„, where is the cr-field generated 
by events [j/n. J/n-ii • • •> yi> !/o]’ By [1], a distribution 
learning can be achieved by ma^mizing partial log- 
likelihood function , i.e., niaXu, •npu;(a;i|^i)- The 

true channel input *,• is not known, but it can be shown 
that the maximization of partial likelihood is equiva¬ 
lent to the maximization of quasi-partial log-likelihood 
function 53<=i lr'Pu;(x,'|^,'), Xj € «? with respect to the 
u and Xi G S. Thus, By using this conclusion, the 
blind equalization algorithm is given as follows: 

• Start with an initial estimate of w 


rp{T) = 

which can be rewritten as 

= 0 ( 6 ) 

It follows from (6) that, for any r, there exists at least 
one Wij, such that V-(r) < Hwijr)/p'‘-K Since i/ has 
finite variance, then t^(0) exists and we get 

V’(O) < V'(0)/p‘’'‘ (^) 

From (6), it also follows that for every r, there exists a 
q such that rP{r) > t/-(t«.-,r)/p*-^ therefore, we have 

t/-(0) > V'(0)/P*='‘ (8) 

We know that (7), (8) hold if and only if = 1, i.e 
A: = 1. This means that the model Mi has only one 
non-zero coefficient wu and wfi = 1. Thus proves the 
theorem. 

Hence, in order to obtain the solution for the blind 
equalization problem, we have to adjust the tap values 
of the CPL equalizer in such a way that the instan¬ 
taneous distribution of the output x(n) of the equal¬ 
izer converges to the input distribution i/. Several cost 
functions such as moment error objective function, gen¬ 
eralized Godard/Sato objective function and Shannon 
entropy can be used for distribution learning. In the 
next section, We present a blind equalization algorithm 
based on the moment error objective function and par¬ 
tial likelihood function. We present simulation results 
that show that the CPL blind equalizer outperforms 
the constant modulus algorithm [5],[15] by orders of 
magnitude when equalizing nonlinear channels. 

4. EXAMPLE 

Assume that the only available information is chan¬ 
nel observations: Pn;Pn-ii • • • i J/Oi and the statistics of 
channel input E[xJ],j = Let Pu;(x„|.7^„) be 


• Maximize quasi-partial log-likelihood function with 
respect to Xi 

• Maximize quasi-partial log-likelihood function with 
respect to w based on the updated Xi 


• Repeat these steps until the algorithm converges 

For the binary communication channel, S = { — 1,1}, 
we choose 


Pw (^n 1-^n) — 


1 ■" e- 


if X = 1 
if X = -1 


where 


Jix„) = i2irj(E[xi]-E[x{ny])^ 

;=i 

E[xi] = ^((n - 1 )£^[ 4 ) + K)> = (4 + ®(«))/2 

x{n) is the output of CPL equalizer, and rj are pos¬ 
itive constants. Our blind algorithm and CM A algo¬ 
rithm [5], [15] are tested for equalization of the nonlin¬ 
ear channel 

G(z) = 1 + 0.7z-^ -1- 0.15(1 -b 0.72-^)^ 

-b0.1(l -b 0.7z"‘f -b 0.05(1 -b 0.7z-^)^ 

Even for this relatively simple communication chan¬ 
nel, the CMA based equalizer exhibits a very poor 
performance. In Figure 1, we plot the BER curves 
for both equalizers. The results show that CPL based 
blind equalizer outperforms the CMA equalizer by or¬ 
ders of magnitude. If we choose the test channel as 
G{z) = 1 -b 0.7z~S Figure 2 shows that CPL blind 
equalizer and CMA algorithm have comparable perfor¬ 
mance. 
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5. CONCLUSIONS 

:anonical piecewise linear structure is introduced as 
)Iind equalizer in this paper. The mapping ability 
CPL network is studied. A methodology to study 
ntically distribution and blind equalization is pre- 
ted for a global CPL system. A blind algorithm is 
•ived based on the moment error objective function 
1 partial likelihood function. The simulation results 
nonstrate that CPL based equalizer outperforms the 
lA equalizer by orders of magnitude when equalizing 
ilinear channel and they perform similarly in linear 
mnel equalization. 
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Figure 1: 2-PAM Nonlinear Blind Equalization 



Figure 2: 2-PAM linear Blind Equalization 
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ABSTRACT 

We introduce a unified statistical framework for real¬ 
time signal processing with neural networks by using a 
recent extension of maximum likelihood (ML) estima¬ 
tion, partial likelihood (PL) estimation theory, which 
allows for (i) dependent observations and (ii) sequen¬ 
tial processing. For a general neural network condi¬ 
tional distribution model and for the general case of 
dependent observations, we establish a fundamental 
information-theoretic relationship for PL estimation, 
show its equivalence to relative entropy minimization. 
We study the dynamics of relative entropy minimiza¬ 
tion (maximum partial likelihood estimation) within 
the well-formed cost functions framework, show that 
these are well-formed cost functions, hence their gra¬ 
dient descent minimization is guaranteed to converge 
to a solution if one exists. The formulation is applied 
to adaptive channel equalization and simulation results 
are presented to show the ability of the least relative 
entropy equalizer to realize complex decision bound¬ 
aries and to recover during training from convergence 
at the wrong extreme in cases where the mean square 
error based MLP equalizer can not. 

1. INTRODUCTION 

Statistical parameter estimation theory has as its fun¬ 
damental support maximum likelihood (ML) estima¬ 
tion which provides estimators with nice large sam¬ 
ple optimality properties and invariant with respect to 
functions of the parameters. However, ML theory is 
traditionally developed for independent observations, 
and a majority of signal processing applications require 
processing of dependent observations. In this paper, 
we introduce a conditional distribution learning frame¬ 
work for real-time signal processing with neural net¬ 
works based on partial likelihood (PL) theory [4], [10], 
Obtained as a partial factorization of the full likeli¬ 
hood, PL also possesses nice large sample properties 


of ML, and more importantly, it can easily be charac¬ 
terized for dependent data and sequential processing. 
Hence, it overcomes the difficulties with other exten¬ 
sions of ML for dependent data, such as conditional 
likelihood which, for easy specification, requires that 
the auxiliary information be known for the whole pe¬ 
riod (i.e. including future observations) [7]. Some of 
the other problems with other factorizations of likeli¬ 
hood for dependent data are initial state specification 
requirements (e.g. when using Markovian representa¬ 
tions for the data) and the problems when dealing with 
missing data. Therefore, PL provides us with a partic¬ 
ularly suitable formulation for real-time signal process¬ 
ing which most of the time requires on-line processing 
of dependent observations. 

We introduce a general neural network conditional 
probability model, and for this model, establish a key 
information-theoretic connection, namely the equiva¬ 
lence of maximum PL estimation and accumulated rel¬ 
ative entropy (ARE) minimization. Hence, distribution 
learning using relative entropy between the true and es¬ 
timated probability mass functions can be achieved by 
maximum PL estimation which does not require that 
the true conditionals be known (which in general are 
not available). This result can be regarded as the ex¬ 
tension of the ML and minimum ARE equivalence for 
independent and identically distributed (i.i.d.) data [9] 
to the general case of dependent observations. While 
providing the theoretical foundation for statistical anal¬ 
ysis of maximum PL estimation, this connection can 
also be used to derive a new class of real-time signal 
processing algorithms based on information-theoretic 
alternating projections [3]. 

We then consider a perceptron probability model 
for binary distribution learning and its application to 
adaptive channel equalization. For the MLP model, 
we derive the least relative entropy (LRE) algorithm 
by gradient optimization and show that it possesses 
nice dynamical properties which can be beneficial to 
the channel equalization problem. Particularly, it is 
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shown that LRE can always recover from convergence 
at the wrong extreme whereas the mean-squared-error 
(MSE) based gradient descent learning on MLP can 
not. This property of the algorithm is discussed within 
the well-formed cost functions framework of Wittner 
and Denker, [8] stating that gradient descent learning 
on such cost functions is always guaranteed to find a 
solution if one exists. In a gradient descent dynamics 
framework, it has been shown that MSE cost function 
is not a well-formed cost function [8], therefore finding 
a solution can not always be guaranteed. With MLPs, 
it is also often the Ccise that the MSE based learning 
algorithms can not recover from convergence at a wrong 
extreme. 

2. DISTRIBUTION LEARNING BY 
PARTIAL LIKELIHOOD ESTIMATION 

Consider the discrete valued sequence {xjt} taking val¬ 
ues from the alphabet S = {oq, ai,..., om}- Define 
the <T-field Tn-i - <t{ 1, x„_i,• • • .xi, xq; t/„, • • •, j/i, t/o} 
where the inclusion of 1 is only a mathematical con¬ 
venience. We parametrize the conditional probability 
mass function (pmf) by a neural network 

as follows: 

=/(Xn.sCy^,^)). (1) 

Here, 6 is the vector of network weights, ^ E 0 where 
O is a compact parameter set, and = [yn,yn-i, 

• • •»yn- 7 V+i]. The term g{y^^6) is the output of the 
neural network, and /(•) and g(-) are continuous and 
differentiable functions. Since includes the entire 
history the network can assume a recurrent structure 
as well. The task is then to estimate the conditional 
pmf pe{xn\^n--i)i or the conditional probabilities 

Pq (Xyi = (li\7’ n-l) Vflj E S. 

In (1), /(•) is included to account for the functional rep¬ 
resentation of the pmf using the M conditional prob¬ 
abilities. Also, an additional constraint on (1) is that, 
/(*) has to be chosen such that = ctj \^n-i) 

= 1, Neural network learning (extracting the informa¬ 
tion represented by the data through adaptation of the 
weights $) can now be viewed as a distribution learn¬ 
ing problem, i.e. estimation of the parameters of the 
conditional pmf, such that the PL function given by 

c^x„-e) = cm = ( 2 ) 

»=1 

is maximized. 


3. EQUIVALENCE TO RELATIVE 
ENTROPY MINIMIZATION 

The relative entropy (RE), or the Kullback-Leibler dis¬ 
tance [6], is a fundamental information theoretic mea¬ 
sure of how accurate the estimated conditional pmf 
p^(a?n|^n-i) is an approximation to the true condi¬ 
tional pmf pe^iXfilTn^i) (assuming it is realized by 
for (1)). We define accumulated relative entropy (ARE) 
at time n as 2’n(^) = Ylkzzi where the relative en¬ 
tropy distance: Dk{peo\\Pe) = ik{0) = E{rk{0)\J^k-i}, 

and Jn(^) = ELiifc(^) where ijb(^) = Var{rkiO)\Pk-i} 
with n(9) = In 

Based on the theory of PL [10], we establish the 
relationship between MPL estimation and ARE mini¬ 
mization by the following theorem: 

Theorem; If there exist a constant 5 > 0 and continu¬ 
ous functions /(♦) and </(•), such that, for each 9 ^ 6q, 
as n —^ oo, 

P{ln{0)/n >6)^1 ( 3 ) 

and 

Jn{9)/n^ —► 0 in probability (4) 

then 

argmin2„(^) ~ argmax£P(^) —► 0 (5) 

B 6 

almostsurely on D = {ln{0) t oo, Yl'i=i 
where C^{0) = ln£^(^). 

Note that the first condition of the theorem, (3), 
represents the rate by which the Kullback-Leibler in¬ 
formation accumulates with n, and guarantees that for 
each 9 9o^ ln{0) oo as n —+ oo, i.e. the informa¬ 

tion continues to accumulate. The second condition, 
(4), on the other hand implies asymptotical stability of 
variance. Thus, the maximum PL estimate 9 also min¬ 
imizes ARE distance between the true and estimated 
conditional distributions asymptotically providing an 
estimate of the true parameter 9o. We emphasize the 
fact that the result holds for the general case of depen¬ 
dent observations and hence provides a generalization 
of the ML and ARE equivalence which is shown for in¬ 
dependent observations, [9]. Proof of this theorem is 
given in [2]. In [2], we also establish consistency and 
asymptotic normality of partial likelihood estimates for 
the conditional probability model of (1). 

4. PERCEPTRON CONDITIONAL 

PROBABILITY MODEL 

If we consider the special case where the sequence Xn 
takes values from the binary alphabet S = {0,1}, the 



problem reduces to estimation of the conditional prob¬ 
ability P(x„ = The PL function IS then char- 

acterized as: 

£P(^) = (1 - 

" -1 (6) 

Consider the following single hidden layer MLP struc- 
ture as the conditional pmf model. 

where w‘ € is the weight vector between the in- 

putlayer and the hidden node i, (i = ^ 

the number of hidden nodes). y„ £ is the obser¬ 

vation vector, and is the weight between the hidden 
node i and the output node We rep^^ent the ^tir^ 
set of weights by 0 = [W. v] € " 

RJKi. The hidden node activation function /»(•) is ch^ 
sen to ensure network approximation capabilities [9], 
e.g. it can be chosen as the familiar logistic or the ra¬ 
dial basis function. However, for learning parameters 
by gradient descent minimization, note that ff( ) has to 

be chosen such that g'{-) >0. _ .. , f 

If we choose both <;(•) and h{-) as sigmoidal func- 

tion, i.e. ^ 

SfCsnV)- i^exp(-s^v) 


l + exp(-yjw0 

for i = 1, • • •, 9. s„ = [si si---, 4]’’. gi-aclient descent 
minimization of the negative log PL cost function re¬ 
sults in the following updates: 

= nj* + f^lS^Cn (^) 

wj,+i = wj, - /I2yn9(«!i) (l - sC^n)) 

for I = 0, ■ • •, q, where sj, = 9 e„ = *„ - 

„(sXv„ = x„ - Peixn = MPn-i) where m and /12 are 
the step sizes at the output and hidden layers respec- 

The binary equalization problem can be rephrased 
as follows in order to comply with the development in 
[8]. For the remainder of the section, assume that the 
nonlinearity is the hyperbolic tangent, an odd fun^ 
tion, without loss of generality. Therefore, - 

l\j:\ g (_l, 1). (Note that the transformation to the 
probability measure is immediate by the application of 
transformation: 5 [(')"h^])- 


Divide the training set y = {yn J 
1,.... M) into two disjoint subsets specified by the de- 

sired output; 

3^ = {ynlp^^n = ll;^n) > 0} U 


Now, define B = B, U {-yn|yn f Bi) so^that the 
solution set can be defined as © = {0\pt{xn - Wn) 

^’^Nexl, wl state the definition 

function in the sense of Wittner and Denker [8]. Con- 
sider cost functions of the form 

J{6) = '^v{yje). ( 12 ) 

1 = 1 

Definition: The cost function J( ) is well-formed if 
i/(.) is differentiable and satisfies the following: 

(i) For all s, -u'{s) > 0 («/(•) does not push in the 

00 S.“»n,e . > 0 sud. that -./(.) > . to, .11 
s < 0 (iy( ) keeps pushing if there is a misclassification). 
(iii) t/( ) is bounded below. 

Proposition 1: If the cost function is well-formed, 
then gradient descent is guaranteed to enter 0. pro- 
vided 0 is not empty. 

Proof: See [8]. 

PK>,>ositioii 2; Negative log PL eost 

_ j-L I" ^ I" 

is well-formed. 

Wit/ the hyperbolic tangent as the nonlinearity and 
for the target x, u becomes 

with = * - tanh(s). In the rephrased version of 

the binary equalization problem for the developmenj 
in this section, the target ® = 1. therefore -«/ (s) - 
1 — tanh(5) and 
rA = 1 — tanh(5) ^ 0, 

(ii) -u'(s) = 1 - tanh(s) > 1 for s < 0, 

(iii) ^ • 1 riT f 

Therefore, gradient descent on the negative log PL cos 

function is guaranteed to find a solution provided that 
one exists. As is well known, there is no such guarantee 
with the MSE cost function when used on MLP s, even 
on those without any hidden units. 

Some further aspects of gradient descent learning 
on the ARE cost function are considered in [1]. In par- 
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ticular, the dynamics is studied by considering its pa¬ 
rameter updates [1], and it is shown that for LRE, the 
backpropagated output error is a non-vanishing con¬ 
trol signal and hence the algorithm can always recover 
from convergence at the wrong extreme while the MSE 
based MLP may not. 

The ability of the algorithm to track large varia¬ 
tions during training can be quite beneficial for chan¬ 
nel equalization. For example, in LEGS (Low Earth 
Orbit Satellite) communication systems, these abrupt 
changes occur quite frequently. Due to the Doppler 
shift, combined with multi-path reflections, the channel 
characteristics undergo an abrupt change as the chan¬ 
nel is switched from one satellite (usually receding with 
a negative carrier shift) to the next successive satel¬ 
lite (usually approaching with a positive carrier shift). 
Another typical case is in land mobile communications 
where multiple cells are transmitting the same infor¬ 
mation (usually with a small frequency offset) to cover 
an entire area. In this case the channel variation oc¬ 
curs when the mobile unit switches reception from one 
antenna to another one having a stronger signal at that 
particular point. In the next section, we present simu¬ 
lation results to demonstrate these dynamics for LRE 
and MSE minimizations in practical channel equaliza¬ 
tion schemes. 

5. APPLICATION TO ADAPTIVE 
CHANNEL EQUALIZATION 

In this section, we present application of partial likeli¬ 
hood estimation with neural networks to adaptive chan¬ 
nel equalization. We consider a simple binary pulse 
amplitude modulation (PAM) data transmission sys¬ 
tem. The supervised adaptive channel equalization 
problem is posed such that the probability that the 
transmitted signal Xn takes the value 1 from the bi¬ 
nary alphabet is to be determined from a training se¬ 
quence given the finite past of the received signal: yn = 
[yriiVn-i, ■ * *, 2/n-Ar+i]- The equalizer structure is shown 
in Fig. 1. 

We study the performance of the LRE algorithm 
given in (9)-(10) as follows: First, we present test re¬ 
sults to demonstrate the capability of the structure to 
realize complex decision boundaries and of the algo¬ 
rithm to learn parameters to achieve these boundaries. 
This is done for minimum and non-minimum phase 
channels at different SNR levels. We then present sim¬ 
ulation results to demonstrate the ability of the algo¬ 
rithm to track abrupt changes during training, a prop¬ 
erty discussed in [1]. The performance of LRE is com¬ 
pared with that of the steepest descent learning (back- 
propagation) based oh the MSE criterion for the same 


structure, i.e. the perceptron model since this is the 
structure we have considered in this paper. 

For the first simulation study, we consider two sim¬ 
ple multipath channels: H{z) = and H{z) = 

0.5 -f i.e. a minimum phase and a nonminimum 
phase channel respectively. Figures 2(a) and 2(b) show 
the decision regions for the first, and figures 3(a) and 
3(b) show the regions for the second channel, for ap¬ 
proximately 21 dB and 11 dB SNRs respectively. The 
MLP structure used in these figures is 2-7-1. As ob¬ 
served in both cases, LRE successfully learns the coef¬ 
ficients for achieving the given partition. These results 
compare favorably with those presented in [5] for the 
MLP equalizer based on the MSE criterion. 

Next, we consider a nonlinear channel example and 
compare the learning characteristics of LRE with MSE 
based backpropagation. We model the nonlinear chan¬ 
nel as a multipath channel (//'( 2 :) = l-f0.5^“®-f0.25^~^^) 
followed by a nonlinearity 0.5(')^, and the PAM com¬ 
munication system has 8 bits per sample with Nyquist 
pulse shaping. Note that since 8 bit pulse shaping is 
used, the multipath structure corresponds to fractional 
previous symbol interference, and full interference of 
second previous symbol. We implement the LRE algo¬ 
rithm for binary alphabet given in (9) and (10), and the 
gradient descent minimization of the MSE on the same 
MLP structure for equalization of the given channel. 
Both algorithms have a 3-8-1 MLP structure. 

To show the recovery property of LRE discussed in 
the previous section, we introduce an abrupt change 
(an exact sign change) in the channel characteristics 
after 150 iterations, effectively causing the current pa¬ 
rameter estimates to be at the wrong extreme. In 
Fig. 4(a), we show the transient characteristics of both 
algorithms with the abrupt change at 150 iterations at 
a signal to noise ratio (SNR) of 19 dB. As observed in 
the figure, LRE can recover from convergence at the 
wrong extreme very effectively. Starting from the very 
first iteration after the change it can follow the changes 
by adapting both its hidden and output layer weights 
in a few iterations. As we can observe in Fig. 4(a), 
MSE based MLP produces many wrong decisions be¬ 
fore it can adapt to this new operating condition. Note 
that both algorithms have not fully converged at 150 
iterations, and if the sudden change causing misclas- 
sifications occurs later MSE based MLP might not be 
able to recover. This is shown in Fig. 4(b), by introduc¬ 
ing the sudden change at iteration 1000. Again LRE 
can very rapidly adapt to the new operating condition, 
rapidly recovering from convergence at the wrong ex¬ 
treme. 
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figure 2: Decision regions formed by the LRE equalizer 
or H{z)= 1 + 0.5^-1 for (a) 21 dB and (b) 11 dB SNR 
represents Xn = 1 and “o” Xn = —1). 



rigure 3: Decision regions formed by the LRE equalizer 
or H{z) = 0.5 + z“Vfor (a) 21 dB and (b) 11 dB SNR 
represents Xn = 1 and “o” x„ = —1). 


Figure 4; Recovery characteristics for MSE and LRE 
MLP equalizers with an abrupt change at (a) 150 (b) 
1000 iterations (SNR =19 dB) 
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ABSTRACT 

In the present paper the non-linear principal compo¬ 
nent analysis method is combined with vector quanti¬ 
zation for the coding of images. The proposed coder 
is fully implemented using neural networks (NN), The 
NLPCA is realized using the backpropagation NN, while 
vector quantization is performed using the LVQ NN. 
The effects of quantization in the quality of the recon¬ 
structed image are then compensated by using code¬ 
book vector optimization. Experimental results are 
presented for the coding of a sequence of images. 

L INTRODUCTION 

The use of the Karhunen-Loeve Transform (KLT) for 
image coding is well known to be an optimal scheme 
for data compression based on the exploitation of cor¬ 
relation between neighboring pixels or groups of pix¬ 
els [1]. Its superior performance has made it a bench¬ 
mark against which other methods such as the Discrete 
Cosine Transform (DCT) and the Walsh Transform 
are compared. Despite its optimality properties how¬ 
ever, it has not found widespread application because 
of the difficulties associated with the needed computa¬ 
tion of the eigenvectors of the image covariance matrix. 
The use of DCT as an approximation to the KLT is a 
well established, easily implementable practical alter¬ 
native. Other well researched alternatives include the 
use of neural networks to implement the image Prin¬ 
cipal Component Analysis (PCA) thus computing its 
KLT [2]; and the use of neural networks to realize a di¬ 
rect coder/ decoder autoassociative mechanism based 
on examples [7]. 

It can be shown ([1], see also [8]) that under certain 
conditions, notably Gaussian statistics, the optimum 
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data compression method is linear and is therefore un¬ 
der the minimum mean-square criterion, identical to 
the PCA (KLT) method. However, this is not the 
case when the data statistics are, more realistically, a 
mixture of Gaussian distributions. A nonlinear Princi¬ 
pal Component Analysis (NLPCA) has been function¬ 
ally defined in [6] by a class of autoassociative Neural 
Networks. The resulting data compression scheme has 
been shown to outperform linear PCA and to be rela¬ 
tively easy to implement. 

The present paper utilizes the NLPCA of [6] to ef¬ 
fect image coding, aiming at higher efficiency coding 
than is possible with the linear PCA method [2]. It 
also combines this with a proposed data compressor 
based on vector quantization. According to this tech¬ 
nique a codebook is created based on the frequency of 
occurrence of vectors in a series of images. This vec¬ 
tor quantization is realized by the Counterpropagation 
(Learning Vector Quantizer (LVQ)) Neural Network. 
Furthermore, an analysis is presented of the error due 
to coefficient quantization in the bottleneck layer. Fi¬ 
nally, a technique is proposed for post-processing of the 
XVQ codebook vectors in order to minimize the quan¬ 
tization effects to the output of the decoding network. 

II. NON LINEAR PRINCIPAL 
COMPONENT ANALYSIS (NLPCA) 

Let Y represent a n x m data matrix ( n is the number 
of the observed data vectors and m is the dimension of 
those vectors). The goal of NLPCA is the replacement 
of each data vector (row of Y) Y with the correspond¬ 
ing row T, of a matrix T, with dimension n x /, / < m 

T = G{Y) (1) 

where G is a non-linear vector function composed of / 
non-linear functions : 

G={Gi,G2,...,G/}; 2;=G.(Y) (2) 
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and by analogy with the linear PCA terminology, Ti is 
referred to as the prime nonlinear factor of T, and Tj 
as the i-th nonlinear factor. Reconstruction of the data 
vector is done by using a second nonlinear function 

H = (3) 

and the reconstructed data vector has components 

y'i = //>(!), i = (4) 

A measure of the loss of information is the error £ = 
y — Y^. The functions G and are selected so as to 
minimize the Euclidean norm of E. 

III. NEURAL NETWORKS REALIZATION 
OF NLPCA 

The neural network implementing the NLPCA in [6] 
consists of joined realizations of the coding and the 
decoding functions. The realization of the coding func¬ 
tion G has the rows of the data matrix Y as inputs 
and thus has m input neurons (see Figure 1). The' 
hidden ’’mapping’^ layer consists of Mi neurons, where 
Ml must be higher than /, The output of the neural 
network is the corresponding row of matrix T and is 
composed of / neurons. It represents the projection 
of the data vector on the /-dimensional space. The 
output neurons can be either linear or sigmoidal. 

In the implementation of the decoding function H_y 
input is one row of matrix T (see Figure 2). Thus the 
input layer consists of / linear neurons. The hidden 
”demapping” layer consists of K 2 sigmoidal neurons, 
where Kn > /. The output of this network is the recon¬ 
structed data vector where each one of the m neurons 
represents one component of the data vector. 

The combined network contains three hidden lay¬ 
ers, the mapping layer involved in modeling in G, the 
middle layer whose outputs represent the features T, 
and the demapping layer involved in modeling H_ (see 
Figure 3). The second hidden layer of the combined 
network is the ’’bottleneck” layer and its size deter¬ 
mines the data compression to be achieved. The input 
and output layers of the combined network represent 
y and y^, respectively. Y_ is both the input to G and 
the desired output from thus, the combined net¬ 
work must be trained to produce the identity mapping, 
y — y. Supervised training can thus be applied to the 
combined network. Training to learn the identity map¬ 
ping has been called self-supervised backpropagation 
or autoassociation [7]. 


IV. VECTOR QUANTIZATION 

Additional data compression was achieved by using 
the Hecht-Nielsen Counter Propagation Neural Net¬ 
work (CPN) on the coder output T to produce its quan¬ 
tized version 7^. The CPN [9] is a feedforward, unsu¬ 
pervised learning neural network formed as a combi¬ 
nation of Kohonen’s Learning Vector Quantizer with a 
Grossberg outstar. In our case it serves for the further 
quantization of the coder output to indexed clusters; 
the CPN output is the index number of the cluster. 
Thus, only the index (an integer) need be transmitted, 
effecting further data compression of the (real number) 
coder outputs. Upon reception of the index, the de¬ 
coder reconstructs the image from a codebook formed 
on the basis of the frequency of occurrence of vectors 
in a series of images. 

The LVQ network guarantees the minimization of 
the quantization error norm ||T —T^lj. However, for 
data compression purposes the minimization of the input- 
output error Y—is needed (Y^^ = KiT^))- Thus in 
our approach, the statistics of the vector quantization 
error T —are then used to update the weights of the 
NLPCA decoder in such a way that the input-output 
reconstruction error is minimized. 

V. CODEBOOK VECTOR OPTIMIZATION 

A basic drawback of the proposed coding approach, is 
that the NLPCA networks and the LVQ networks are 
trained independently. This is due to the fact that if 
the two networks were trained simultaneously (i.e. at 
each iteration of the training of the NLPCA network 
the output of the third level was led as an input to 
the LVQ training network, and the output of the LVQ 
network was led as an input to the fourth level of the 
NLPCA network, as shown in Figure 3) the stability 
of the steepest descent training procedure would be no 
longer guaranteed. 

Due to the non-linearity of the proposed coder a 
small error (due to quantization) may lead to large er¬ 
rors in the decoding phase of the network. Motivated 
from this fact we propose a post processing phase after 
the training of the NLPCA and LVQ networks, that 
modifies appropriately the codebook vectors in order 
for the input-output mean square error to be mini¬ 
mized. According to the proposed technique, if 6,- is 
the output of the ith neuron of the first layer of the de¬ 
coding network the following error must be minimized 

m 

(5) 

i=i 

where y is the input vector and is the output vector. 
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The equations describing the output vector y in 
terms of the quantized coefficient vectors (output of 
the LVQ network) are : 


/ 

bi = fi^ivhiTk) + 0i) (6) 

h=\ 

Mi 

y'^ = fC^iwijbi) + Ti (7) 

i=i 

where ^ 

/(*) = 1 + e-* 

where v are the weights connecting the first with the 
second layer of the decoding network and w are the 
weights connecting the second with the third layer of 
the decoding network. Minimization of (5) in terms of 
the Tft, = 1 ,...,/ implies 


2Sk<£iJlfi>!.oro,* = ../ 

en 


thus, 




Note that 


Mi 


^ = y'ji^ - 


»=i 


Thus, the following system of k non-linear equations 
must be solved in terms of Tit, for A; = 1,..., / 


m 

X^(yj - yj)yji^ - yj) S ~ ° 

;=1 i=l 

for A = 1,Note that the Tjt’s are implicitly con¬ 
tained in the above equations, as seen by equations (6) 
and (7) for 6; and 

The steepest descent method was used for the solu¬ 
tion of the above system of non-linear equations. The 
LVQ codebook vectors were used as initial estimates for 
ajt. The post-processed codebook vectors are then used 
for the coding of the coefficients of the coding network 
ofNLPCA. 


VI. EXPERIMENTAL RESULTS 

The proposed still image coding method was applied for 
the coding of the 9 first frames of the image sequence 
“Trevor White” of size 256 x 256. The images were 
divided into blocks of dimension 4x4, creating vectors 
of dimension 16. The neural network implementing the 


NLPCA was chosen with m = 16, i.e. 16 neurons in 
the first and last level, Mi = M 2 = 35 neurons in 
the second and fourth layer and / = 8 neurons in the 

middle level. . . 

We have examined two approaches for the training 
of the NLPCA network : a) only the first image was 
used as a training set, b) the first, the fifth and tlie 
ninth image were used as a training set. The L\ Q 
network was trained with the coefficients corresponding 
to the training set of the NLPCA network. Codebooks 
of 1024, 512 and 256 were tested corresponding to bit 
rates of respectively. 

The PSNR of the reconstructed image is shown in 
Tables 1 and 2. The original and decoded frames I, 
5, and 7 of the “Trevor White” sequence are shown in 
Figures 4-9. The results show that the method works 
satisfactory when the NLPCA network is trained with 
the set of the three images. When only the first image 
(1-frame) is used for the training of the NLPCA net¬ 
work only the first 4 images are coded efficiently, since 
the motion is very small and the correlation with the 

first image is high. . „ r 

The use of the method proposed in Section V lor 
quantization error compensation during codebook ini¬ 
tialization has seen to improve considerably the results 
(more than 1 dB improvement compared to Tables 1 
and 2), especially for the codebooks of 256 and 512 vec¬ 
tors, where the quantization effects are not negligible. 

The NLPCA method was applied to the compres¬ 
sion of “Lenna” both with and without additive noise. 
Blocks of dimension 4x4 were used, with various sizes 
of the mapping and the bottleneck layer and with linear 
bottleneck layer units. The results were compared to 
the results of PCA coding (KLT). It was .seen that with 
Gaussian noise, the results were comparable with tho.se 
of the linear PCA scheme; in cases of non-Gaussian 
noise addition, however, the NLPCA method performed 
considerably better than the PCA method. 

VII. REFERENCES 

[ 1 ] A. N. Netravali and B. G. Haskell, Digital Pictuies. 
Representation and Compression, Plenum Press, New 
York and London, 1988. 

[2] T. D. Sanger, “Optimal Unsupervised Learning in 
a Single Layer Linear Feedforward Neural Network”, 
Neural Networks, Vol. 2, pp. 459-463, 1989. 

[3] E. Oja, “A Simplified Neural Model as a Princi¬ 
pal Component Analyser”, J. Math. Biology, Vol. 15, 
pp. 267-273, 1982. 

[ 4 ] P. Baldi and K. Hornik, “Neural Networks and Prin¬ 
cipal Component Analysis : Learning from Exam¬ 
ples without Local Minima,” Neural Networks, Vol. 2, 
pp. 53-62, 1989. 


47 



Frame 

0.625 

0.5625 

0.5 


biis/pixel 

bits/pixel 

biis/pixel 

1 

35.00 

31.10 

29.10 

2 

31.60 

29.91 

___ 

28.50 

3 

31.21 

29.77 

28.35 

4 

30.03 

29.01 

27.95 

5 

29.68 

204 

27.77 

6 

29.30 

28.58 

27.70 

7 

29.22 

28.30 

27.61 

8 

28.85 

28.11 

27.40 

9 

28.56 

28.01 

27.30 


Table 1: PSNR for the coding of the first 9 frames of the 
“Trevor White” sequence using the NLPCA network 
trained using only the first frame. 


Frame 

0.625 

0.5625 

0.5 


biis/pixel 

biis/pixel 

biis/pixel 

1 

32.25 

29.95 

28.85 

2 

30.80 

29.17 

28.50 

3 

31.02 

29.21 

28.48 

4 

30.68 

29.01 

28.25 

5 

30.81 

034 

28.70 

6 

30J7 

29.22 

28.82 

7 

31.25 

29.67 

29.05 

8 

31.26 

29.70 

29.13 

9 

32.31 

30.20 

29.35 


Table 2: PSNR for the coding of the first 9 frames of the 
“Trevor White” sequence using the NLPCA network 
trained using the first, fifth and ninth frames. 
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Figure 1: The encoding phase of the NLPCA network. 
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Figure 2: The decoding phase of the NLPCA network. 
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Figure 3: The encoding and decoding phases of the 
NLPCA network. 
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Figure 4: Original frame 1 of the “Trevor White” se- 
quence. 



Figure 7: Reconstructed frame 5 of the “Trevor White” 
sequence coded at 0.625 bits/pixel using NLPCA 
trained only using the first frame. 



Figure 5: Reconstructed frame 1 of the “Trevor White” Figure 8; Original frame 7 of the “Trevor White” se- 

sequence using NLPCA trained only using this frame. quence. 



Figure 6: Original frame 5 of the “Trevor White” se¬ 
quence. 


Figure 9; Reconstructed frame 7 of the “Trevor White” 
sequence coded at 0.625 bits/pixel using NLPCA 
trained with the first, fipth and ninth images. 
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ABSTRACT 

In the present paper we determine two families of anal¬ 
ysis and synthesis vector filters which achieve opti¬ 
mal construction of multiresolution vector sequences 
by minimizing the variance of the error signals be¬ 
tween successive pyramid levels. A measure of the en¬ 
tropy reduction achieved by the pyramid is in this way 
maximized. The effect of this is to ensure that the 
lower-resolution image produced by the primary sub¬ 
band bears maximum resemblance to the input image. 
Furthermore, it is assumed that additive transmission 
noise corrupts the downsampled signal prior to the syn¬ 
thesis stage. It is seen that under noiseless or lossless 
transmission conditions, the two above families of opti¬ 
mal analysis and synthesis filters coincide. The results 
are evaluated experimentally for the vector coding of 
color images. 

L INTRODUCTION 

Subband analysis/synthesis techniques have been ex¬ 
tensively studied for image and video coding applica¬ 
tions [1], According to the subband coding technique 
the image is decomposed into several sub-images in 
terms of different frequency bands by a filter bank and 
to code the sub-images instead of the original image. 

Pyramidal image coding has also been studied [2, 3] 
and optimal construction of the pyramid sequence was 
sought by minimizing for each level of the pyramid the 
variance of the error image. In this way, a measure of 
the entropy reduction achieved by the pyramid is max¬ 
imized. If the pyramid is to be used for the scalable 
or progressive coding of the sequence, this construc¬ 
tion also ensures the production of a same-size copy of 
the signal or image which at a lower resolution bears 
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as much resemblance to the original as possible. In 
a typical scalable coding application this copy may be 
transmitted via a slower communication channel, while 
the original is perfectly reconstructed from the entire 
pyramid. In an alternative scheme, a perfect recon¬ 
struction filter bank may be used for the transmission 
of the full-resolution signal or image, of which one or 
more bands are retained for the construction of one or 
more copies of the original at lower resolutions. Again, 
the filters are chosen so as to ensure that at the lower 
resolutions, the copies bear as much resemblance the 
original image as is possible. This is achieved by min¬ 
imization of the error occuring when only one band is 
retained of the perfect reconstruction filter bank. 

Along with scalar processing, vector processing has 
attracted particular interest in the signal and image 
processing community recently [4]. Vector transform 
coding techniques have recently been used for image 
coding applications [6] to remove the inter-vector cor¬ 
relation. 

In the present paper the results of [2, 3, 6] are first 
generalized and the problem of the optimal design of a 
vector pyramid or a vector subband coding scheme is 
addressed. Furthermore, the results are generalized to 
the case where transmission noise corrupts the down- 
sampled signal prior to the synthesis stage. 

In the examined scheme the analysis or the syn¬ 
thesis part is considered fixed (i.e. the analysis or the 
synthesis filters are fixed) and the statistics of the quan¬ 
tization part involved for transform coefficient coding 
are considered known. The problem then is to define 
the optimal synthesis (analysis) vector filters that min¬ 
imize the distortion due to the quantization of the sub¬ 
band or pyramid vector transform coefficients. Thus 
specific knowledge about the power spectrum of the 
original signal and the quantization noise, can be in¬ 
corporated to design optimally the vector filter bank 
so as to minimize the quantization distortion. 
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11. ORTHOGONAL TWO-CHANNEL 
VECTOR FILTER BANKS 


Recently in [7] the vector filter banks were introduced 
and the perfect reconstruction orthogonal analysis / 
synthesis vector filters H(tj) and G(a;) were defined 
that satisfy the following properties 

H(0) = /jv, or = /iv 

Jb 

where 

Hiz) = J2HkZ-^ 

k 

and 

H(z)H’’(z) + H(-z)H^(-z) = In 
G(z)H^(z) + G(-z)H^(-z) = In 

and 

G(2r)G'^(2:) + G(“-z)G'^(— 2r) = /;v 

and also that the Hk is symmetric. In the same work, it 
was proven that an M x M matrix F(; 2 ;) is paraunitary 
and FIR if and only if 

F(0) = e*”^°^Up(^)...Ui(z)F 


H(z) 


il 




G(Z) 


——4 -- 

Figure 2 I Hierarchical Vector Pyramidal 


encoder. 



Figure 2 : Hierarchical Vector Subband 
encoder. 


is followed by a decimating filter. Interpolation is then 
used to revert to the original image size: 

. , ( u[2mi] if m = 2mi 

,'^['”1=10 otherwise 


where mo is an integer, p is a nonnegative integer, F is 
an M X M constant unitary matrix and 

U,(w) = Im + (e"^ - l)viv?’ 

where is a unit-norm constant M x 1 vector, for 
I = ... 

In the present work we deal with deriving the opti¬ 
mal biorthogonal perfect reconstruction vector filters. 


If the pyramid is used in signal or image transmission, 
the downsampled signal is subject to corruption by 
noise. The quantizers and transmission noise sources 
are physically located before the upsamplers. For the 
sake of notational simplicity, however, we shall equiva¬ 
lently place them after the upsamplers, assuming sim¬ 
ply that the zeros interpolated by the upsamplers are 
always quantized to zero. If additive noise n[m] is as¬ 
sumed. 


III. OPTIMAL VECTOR PYRAMIDAL 
AND SUBBAND DECOMPOSITIONS 

A multiresolution data representation consists of a se¬ 
quence of linear transformations of the data with suc¬ 
cessively reduced resolution. If the vector sequence 
x[m] represents the original data, the construction of 
the multiresolution sequence begins with the cornputa- 
tion of the predicted value u[m] of each x[m] as a local 
weighted average : 

N 

u[m] — ^ h[z]x[m — i] = h-A-x (1) 

i=-N 

where the asterisk ★ denotes convolution. A well-known 
specific multiresolution representation is based on the 
construction of the ‘‘Gaussian Pyramid” in which (1) 


TV' 

yN = X) S[*](v[»n - »] + n[m - i]) (3) 

i=-N' 

The error image is: 

e[m] = x[m] - y[m] (4) 

and the total error variance : 

E= E{e^[m]e[m]} ( 5 ) 

The process is repeated for the reduction in resolution 
of the sequence x'[m] = u[2m]. The corresponding 
sequence of error images is known as the ”Laplacian” 
pyramid. 

An optimal construction of the pyramid sequence 
may be sought by minimizing for each level of the pyr¬ 
amid the variance of the error image (5). In this way, a 
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measure of the entropy reduction achieved by the pyr-; 
amid is maximized. . : ^ ^ 

The output of the prefilter followed by decimation 
and interpolation by zeros is given by 

. v[m] = ti;[m]^h[i]x[m- j] = w[m]u[m] (6) 


where 


t/;[m] = 


(i+(-ir) 

2 


(7) 


The sequence u[m] is not wide-sense stationary; how¬ 
ever the time averages 


Ruxb] = 


1 

2/C-hi 


u[m]x^[m-p], 

m=-K 


nv[p] = (8) 

m— — K 

are seen to exist, under nonrestrictive conditions on 
x[.]. With the above definition, it can be proven that. 


R-uxN =IY2 - »] = ih * Rx 

t 

Also, 

Rv[p] = |cb](l»[p] * R-xb] * h[-p]) (9) 

The corresponding Z-transforms are therefore related 

1 

$vxb) = 5 H(z)$xb) (10) 

$v(^) = j(A(z) + A(-z))=ip(z) . (11) 

where 

A(z) = H(z)$x(^)H^(2"') • (12) 

Then using the above arguments, it is easily seen that 
the sequences s[m], possess auto- and cross-correlations 
and spectra defined as in (8). Note also that since 
s[m] = 0 if m = odd, their spectra satisfy 

Rs[2n -f 1] = 0, $s(2) = $s(-^) (13) 

Likewise, the output y of the interpolating filter is seen 
as before to possess cross and autocorrelation functions 
defined as in (8). Their Z-transforms are related by 

$yx(2) = G(z)#sx(2) (14) 


#y(z) = G(z)$s(^)G^(« ‘) (13) 


From (5), the error variance is found by 

2irjE = tr[E{e[m]e^[m]}] = tr[^ '’dz] (16) 

where tr[F] is the trace of the matrix F and $e(2) »s 
the power spectrum of the error e[m]. Clearly $e(^) = 
$jj(^)_$xv(z)-%x(^) + ^y(^) he^^ce from (14-16) 

2irjE = tr[^ ($x(2) - 2G(z)$sx(^)) z~^dz]+ 

tr[^ G(z)$s('8^)G^(z ^)z ^dz] (17) 

Thus, the design of either the pyramidal or the subband 
decomposition scheme should aim at the minimization 
of the error variance (17). 

IV. OPTIMAL FIR AND HR VECTOR 
FILTERS 

With arbitrary given h[i], the optimum FIR filter g[i] 
in (3) will minimize the error variance (5) if the well 
known orthogonality condition holds : 

E{ ^x[m] - ^ g[*]s[m - i]^ s^[m - /]} = 0, 

for / = -N,.. .,N. This implies 

Rxs[/] = Eg[i]Rs[/-i]. ,l = -N,...,N 

1=0 

Given (11) this may be separated into two sets of equa¬ 
tions for the identification respectively of the even- and 
odd- indexed coefficient matrices g[t] : 

Rxs[2/i] = ]^g[2fi]Rs[2/i — 2ii] 

• 1 

Rxs[ 2/2 + 1] = g[2i2 + 1 ]R-s[ 2/2 - 2i2] 

»2 

which define fully g[t],i = The optimal 

HR filters are found by direct minimization of (17). 
We shall consider the noiseless case first, 

IV.l. NOISELESS CASE 
In this case s = v, and hence 

$sx(^) = ^vxiz) = ^H(z)$x(^) (18) 

$s(2) = 4&v(^) = (19) 
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with P given by (19). The error variance is found to 
be 

2iTjE = tr $x(«) - H(z)#x(«)G(^)+ 

where 

Q(z) = i(G’’(z-i)G(z) + G^(-z-‘)G(-z)) . (21) 

The optimal pyramidal and subband decompositions 
will be obtained by minimization of the above expres¬ 
sion (20) for the error variance. Assuming first the 
analysis filter H(z) fixed and given, the opUmum cor¬ 
responding synthesis filter minimizing (20) is given by 

G(z) = #x(2)H^(2“^)P"H^) • (22) 

Conversely, if the synthesis filter G(2) is fixed, the op¬ 
timum analysis filter can be found. This is achieved 

H(.) = Q-‘WG’'(«-') (23) 

where Q(z) is given by (21). The globally optimum 
filter pair (H( 2 ),G(z)) will be found by either (22) or 
(23) and the minimization of the resulting expression 
in (20). The minima found either way are easily seen 
to be identical. 

To develop the optimum filter bank for vector signal 
coding, an extra constraint is that the filters H(2) and 
G(z) selected be such that a perfect reconstruction fil¬ 
ter bank can be built with those filters respectively its 
analysis and synthesis filters for its primary band. The 
analysis and synthesis filters in the remaining bands 
are chosen to satisfy the perfect reconstruction prin¬ 
ciple. In the noiseless case, the filters defined by (14), 
(15) always satisfy the perfect reconstruction condition. 
Thus in this case, the low-pass band of the filter has 
analysis and synthesis filters identical to those in the 
optimal pyramidal analysis. 

IV.2. NOISY CASE 

With considerable insight into the structure of the op¬ 
timal pyramids and filter banks gained from the con¬ 
sideration of the noiseless case, the noisy case may now 
be considered. Again, if the analysis filter is fixed, the 
optimum synthesis filter in the pyramidal configuration 
will be found by minimizing (17). It can be shown that 
this will be given by 

G(z) = «sx(^"^)*sU^) • (24) 

This is a completely general expression for the optimal 
synthesis filter, which can be further analyzed under 


some additional simplifying assumptions. For example, 
the additive noise may be assumed to be uncorrelated 
with the input : 

#vn(^) = 0 (25) 

This assumption is reasonable in the instance of trans¬ 
mission noise and is justified for a large class of prac¬ 
tical quantizers which includes fine and dithered quan¬ 
tizers [1]. In this case 

®sx('2() = ®vx(2() = 2^(^)4^^(^) 

$s(^) = 4 &v( 2) + + *r(z) . (26) 

where $r(«) is the noise power spectral density. Note 
also that : 

$r(z) = $r(—z) (27) 

From (24), the optimal G( 2 ) is 

G(z) = $5 (z-')H^(^-')[P(^) + 2$r(^)]-' (28) 

It can be seen, that in this case, unlike the noiseless 
case, the optimal pyramidal synthesis filters fail to sat- 
isfy the necessary condition for perfect reconstruction 
and therefore do not offer a solution to the problem of 
optimal filter bank construction whether in the general 
form (24) or in the special case (28). However, in the 
latter case, the analysis filters may be chosen so as to 
form a perfect reconstruction bank with the optimal 
synthesis filters. 

V. EXPERIMENTAL RESULTS 

The proposed vector subband coding method was tested 
for the coding of multichannel images. The results are 
evaluated in the coding of the color RGB image “Pep¬ 
pers” of size 256 x 256. In all the cases examined, 
uniform quantization of the transform coefficients was 
applied. 

For the definition of the first family of analysis / 
synthesis filters the conversion from YUV to RGB for¬ 
mat given by 

Xrgh ~ ASJyuV 

was used where A is defined by 

'10 1.402 

A= 1 -0.34414 -0.71414 . (29) 

1 1.772 0 

Thus, one choice for the synthesis vector filter is 
G(-2:) = AA(2r) 


53 



where 


A(2) 


Ai(2) 0 0 

0 A2(2) 0 

0 0 As ( 2 ) 


(30) 


and Aj(z)j for i = 1,..., 3 are FIR low-pass filters. In 
this case the analysis filter would be given by 

H(2) = D“‘(2)A^(2-‘)A 


where 

D(2) = A^(2)A^AA(2) + A^(2-^)A^AA(2-1) 


The fitness of the multichannel AR models to color- 
image modeling is measured in terms of the prediction 
mean square error 

MSB - -^^2 ~ - m]||^ 

i m 


The power spectral of the input is then given by 


$x(^) 


_ ^w(.g) _ 

1 - Em. Emj C(mi)C^(m2)2’"*-’"2 


Simulations were performed also using the second fam¬ 
ily of vector filters and the results were comparable 
with the ones obtained with the first choice of filters. 


The above filter matrices H( 2 r) and G( 2 r) are chosen as 
the analysis / synthesis filters of the primary band of 
the vector subband coding scheme. The analysis and 
synthesis filters in the remaining bands are chosen to 
satisfy the perfect reconstruction principle. The above 
filters were tested for the coding of the color image 
“Peppers”. In Figure 1 the proposed technique is com¬ 
pared in terms of PSNR versus bitrate, with the scalar 
subband coding with the same type of filters [6]. 

The conversion between the standard Red Green 
Blue (RGB) format to YUV format may also be used 
for the definition of the second family of analysis/synthesis 
filters. The conversion, written in a matrix form, is 



^yuv 




where 


B 


0.299 0.587 0.114 

-0.1687 -0.3313 0.5 

0.5 -0.4187 -0.0813 


(31) 


Thus a good choice for the analysis filters will be 


H(z) = BA(z)$x(^) . 

The power spectra of the input may be approximated 
by making the assumption that the input signal can be 
modeled with a three-channel AR model of the form 

Pi 

x[z] = ^ C[m]x[z — m] -b v/[i] 

m—l 

in which the model coefficients C[m] are 1x3 vectors, 
and image pixels x[m] are vectors of length 3. The 
three-channel Yule-Walker equations are 

E /-ir To r* 1 / m — 0 

m ^ 

The autocorrelation matrices are given by = 

£'{x[f -f- A;]x^[z]}. The solution of (V) gives the coeffi¬ 
cients C[m] and the correlation matrix Pi^ . 


Figure 1: Bit rate versus PSNR performance of the vec¬ 
tor subband coding technique (VS), compared to the 
scalar subband coding (SS) of each vector component. 
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ABSTRACT 

This paper focuses on the source coding of stereo im¬ 
age pairs. In particular, the conditional coder (CON- 
COD) and its properties are discussed. The problems 
attributed to disparity compensation (DC), a widely 
used conditional coder, are introduced. Finally, a new 
conditional coding strategy, the Subspace Projection 
Technique (SPT), is proposed. The SPT is a transform- 
domain approach with a space-varying transformation 
matrix and may be interpreted as a spatial-transform 
domain representation of the stereo data. 

1. INTRODUCTION 

The human brain can process the subtle diflPerences be¬ 
tween the images that are presented to the left and 
right eyes to perceive a three-dimensional outside world. 
This ability is called stereo vision. A stereoscopic sys¬ 
tem may be used to artificially stimulate the stereo vi¬ 
sion ability. A stereo pair, a pair of images of the same 
scene acquired from two different perspectives, is pre¬ 
sented to the observer so that the right image is seen 
by the right eye and the left image is seen by the left 
eye. The human observer then perceives the scene in 
depth by processing the relative displacement, i.e., the 
disparity, of the objects between the two images of a 
stereo pair. There exists an inherent redundancy be¬ 
tween the images of a stereo pair that can be exploited 
for efficient transmission and storage of stereo images. 

2. CONDITIONAL CODER 

A stereo pair (A, T) can be modeled as a vector-valued 
outcome of two correlated discrete random processes. 
Since stereo image communication is generally consid¬ 
ered an optional extension to the basic monocular sys¬ 
tem, only a small percentage of the total bit rate is 
allocated for the second image (or equivalently for the 

This work was supported in part by the Joint Services Elec¬ 
tronic Program, Grant No. DAAH-04-93-G-0027. 


source Y) [1, 2]. Moreover, at least one of the images 
should be decoded separately. One coding strategy 
that satisfies both conditions is the conditional coder 
(CONCOD) and it may be described as “code one im¬ 
age and then code the second image given the coded 
first image”. The CONCOD structure is suboptimal in 
the sense that 

JlK{Dx)+nY\x{Dy)<Exy{DxyDr)<JiK{Dx)-\-BY^{Dy)y ( 1 ) 

where Rx{Dx) is the rate-distortion function for X, 
Rx,y(Dx,Dy') is the joint rate-distortion function for X 
and Y (the optimal coder), Ry\x{Dy) is the conditional 
rate-distortion function for Y given the original A', and 
Ry\x{^) is the conditional rate-distortion function for 
Y given the encoded X (the conditional coder) [3]. An 
interesting property for very low bit rate coding of the 
second image follows if we rewrite Eq.(l) with a looser 
lower bound: 

Rx{Dx) < Rx,y{Dx,Dy) < Rx(Dx) 4- Ry^xi^y)- (2) 

For the extreme case, in which we allocate zero bits 
for the second image (the distortion Dy for the sec¬ 
ond image is equal to its maximum value i.e., 

= 0)j the CONCOD structure is optimal 
since the lower and upper bounds of Eq. (2) are identi¬ 
cal. As a result of the continuity of the rate-distortion 
functions, for any e > 0 there exists a 6 such that if 
- DyA<^ then 

\Rx{Dx) + Ry^xiDy,) - {Rx{Dx) + Ry^xi^Vr^.M 

~ ^y\xi^ys) (3) 

However, since 

Rx,Y{Dx,Dy^^^) = Rx{Dx) -\- Ry^xi^Yn^^x): (4) 

Rx,Y{Dx,Dy^) > i^A^K(DA^Dr„,,J, 
we can rewrite Eq. (3) in the following form: 

\Rx{Dx)-^Ry^^{Dy,) - Rx,y{Dx,Dy,)\ < c, (5) 

which implies that the CONCOD structure is perform¬ 
ing arbitrarily close to the optimal solution given that 
Ryixi^y) is small. 
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Ba-secl on this observation, conditional coding tech¬ 
niques that minimize the bit rate for the second image 
by exploiting the stereo redundancy while preserving a 
required quality have been investigated [1, 4]. Dispar¬ 
ity information, for example, is used to displace pixels 
in a reference image to form a predicted frame. For 
some applications the prediction error is transmitted to 
improve the quality of the coded images. This particu¬ 
lar scheme is referred to as the disparity compensated 
(DC) prediction and is a special case of the CONCOD 
structure. Note that the operational rate-distortion 
curve of any DC scheme is bounded from below by 
the theoretical rate-distortion curve of the CONCOD 
structure. In general, an intensity-based block match¬ 
ing algorithm is employed to obtain the disparity field 
[5]. For very low bit rate applications, an important 
portion of the bit budget is spent for the disparity 
vectors and DC-based algorithms suffer from several 
problems including the failure of the constant intensity 
axiom, false predictions for the occlusion regions, and 
blocking artifacts (4, 6 ]. Rate-distortion and perceptual 
performance of the stereo coding schemes may be im¬ 
proved if the characteristics of the human visual system 
and physical properties of the stereo imaging systems 
are exploited. In particular, the following properties 
are worth mentioning; 

• The intensity of light that is reflected from an ob¬ 
ject and recorded by the camera depends on the 
position of a camera relative to the object [7]. 
Moreover, different camera characteristics may 
yield systematic luminance differences. 

• In the transform domain, low frequency coeffi¬ 
cients are more correlated than the high frequency 
coefficients [3]. Moreover, low frequency coeffi¬ 
cients are perceptually important [3]. 

• Exploiting and alleviating the interblock redun¬ 
dancy of the transform domain coefficients dimin¬ 
ish the visibility of the blocking artifacts [ 8 ]. 

The next section describes a coding technique that con¬ 
siders these observations. 

3. SUBSPACE PROJECTION TECHNIQUE 

We recently proposed a new approach to stereo im¬ 
age coding that performs disparity compensation in a 
spatial-transform domain framework [4]. The novelty 
of the proposed approach is that an incomplete trans¬ 
form basis for transform domain compensation and a 
multiplicative gain factor for spatial domain compensa¬ 
tion are used. The proposed algorithm, which is called 


the Subspace Projection Technique (SPT), may be in¬ 
terpreted as a data dependent block transformation. 

The SPT obtains an estimate for each mxm block 
(or equivalently m^-dimensional vector in the Euclidean 
space br of the right image by post processing the 
prediction br that is obtained through disparity com¬ 
pensation (DC). The SPT algorithm does not specify 
the disparity compensation technique. One can obtain 
disparity vectors using fixed block size DC [5], variable 
block size DC (VDC) [1], windowed DC (WDC) [ 2 ], or 
hierarchal block matching techniques [9]. 

As with many data compression algorithms, our 
motivation is to find an efficient representation for each 
vector br so that we can transmit or store the given 
data with fewer bits. The idea of the SPT is to achieve 
this objective by creating a suitable transformation, T, 
from the Euclidean space i?*"’ to a proper subspace 
S. We consider the use of a block-varying transforma¬ 
tion in order to exploit the stereo redundancy explicitly. 
Therefore, the vector br is included in the sjian set of 
the subspace S as well as a set of fixed orthogonal poly¬ 
nomial vectors V = {vi}f"‘. The choice of polynomial 
vectors is motivated by the computational savings they 
introduce and by the smooth intensity variations found 
in most natural images. Moreover, the polynomial vec¬ 
tors yield a good approximation to the incomplete di.s- 
crete cosine transform [ 10 ]. 

The transform domain representation for the vector 
br is given by: 

iV-l 

T(br) =7-bo + ^ ai- Vi, (6) 

where bo is the orthogonal component of the vector br 
to the fixed vectors and 7 and Oi’s are the projection 
coefficients that can be obtained by: 

= i = l,2,---,7V-l (7) 

< Vi, Vi > 

_ < br.bp > 

^ <bo,bo> 

The coefficient 7 is a multiplicative gain factor that 
adapts to the local changes in cross-correlation statis¬ 
tics of the stereo data. An average gain of l- 2 dB is 
achieved over the estimate that assumes 7 = 1 . A typ¬ 
ical histogram of the coefficient 7 is presented in Fig. 1. 
Although, the peak magnitude of the histogram is im¬ 
age dependent, typical values range between 0.8 - 1 . 1 . 
The peak magnitude is equal to 0.9 for this example. 

The fixed subspace estimate oji • Vj is the low 

frequency component of the block br. In general, ei¬ 
ther one (zero order approximation), three (first or¬ 
der approximation), or six (second order approxima¬ 
tion) fixed vectors are used. We employ DC in the 
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Histogram of the Multiplicative Gain 


room 



Figure 1: Histogram for the multiplicative gain factor. 


transform domain for the fixed subspace coefficients. 
This is equivalent to correcting for systematic (local 
and global) luminance differences. The disparity com¬ 
pensated prediction for those are given by: 


< br,Vi > 

< Vi, Vi > 


1 . 


(9) 


We transmit the prediction error, ei = ai — di. 

There are two implementations for the SPT. The 
first is to use a large projection block size (e.g., 16- 
by-16) to obtain an initial estimate and to code and 
transmit the residual given by hr — T(br). The second 
is to use a smaller projection block size (e.g., 4-by-4:) 
and quantize the coefficients €i’s and 7 using a subband 
coder so that the spatial correlation between the coef¬ 
ficients of the neighboring blocks (inter-block redun¬ 
dancy) is exploited and the visibility of the blocking 
artifacts diminishes [ 8 ]. A locally optimal bit alloca¬ 
tion scheme is employed to determine the quantizer for 
each coefficient [ 2 ]. 


4. EXPERIMENTAL RESULTS 

This section presents a comparison of several stereo 
image coding algorithms. In the experiments, we code 
the left image independently and the right image based 
on the coded left image. 

Three different stereo pairs are used for the exper¬ 
iments. The first pair was obtained from the “fiower 
garden” sequence (frame numbers 0 and 2). These im¬ 
ages are 352-by-240. The second is the “Lab” pair (512- 
by-480), which was obtained by shifting a video camera 
and taking two pictures of a stationary scene from two 


method 

DC 

WDC 

VDC 

SPT 

V-SPT 

W-SPT 


f. garden 

“22.97 

23.24 

23.53 

23.70 

24.11 

23.95 


lab 

28.84 

29.28 

29.91 

30.87 

31.51 

31.13 


23.47 

23.80 

24.16 

26.83 

27.05 

27.13 


Table 1 : Rate-distortion performance of the stereo im¬ 
age coding algorithms. The PSNRs of the encoded im¬ 
ages (in dB) are presented at a fixed bit rate (0.06bpp 
for the lab and room pairs and O.OSbpp for the flower 
garden pair). 


horizontally shifted locations. Finally, the third one 
is the “Room” pair (256-by-256), which was obtained 
using the same procedure as the “Lab” pair. 

First, the performance of several stereo image cod¬ 
ing algorithms at very low bit rates (around 0.05 bpp) 
are compared. These include fixed block size ( 8 -by- 8 ) 
disparity compensation (DC), windowed DC (WDC), 
variable block size DC (VDC), subspace projection tech¬ 
nique (SPT) using fixed block (16-by-16) size DC es¬ 
timates, SPT using variable block size DC estimates 
(V-SPT), SPT using WDC estimates (W-SPT). An av¬ 
erage gain of 0.75dB is achieved for the given pairs by 
using VDC. Occlusion regions are better estimated by 
this method. Similar gains are obtained by V-SPT. 
WDC improves the PSNR by 0.32dB on the average 
over DC for the test data. Moreover, blocking arti¬ 
facts are removed by the windowing operation. All 
implementations of SPT are found to be superior to 
DC-based techniques in the rate-distortion sense. The 
quality of the images that are coded by the SPT can 
be improved by either decreasing the projection block 
size or increasing the number of fixed vectors if a small 
increase in the bit rate or computational complexity is 
allowed. For example, for the lab image, one can obtain 
31.2 dB (at the same bit rate) if six fixed vectors are 
used instead of three fixed vectors. Same PSNR can be 
obtained with three fixed vectors if the projection block 
size, m, is chosen to be 4. In Fig 2, we present the effect 
of increasing the number of fixed vectors. At low bit 
rates both implementation have similar performance ^. 
Therefore, the one with the lower computational com¬ 
plexity is preferred. 

Second, the performance of the following coding al- 

^In theory, increasing the number of fixed vectors improves 
the rate-distortion performance for all bit rates. However, for this 
particular example a locally optimal bit allocation scheme was 
used. As a result, the implementation with three fixed vectors 
performs slightly better than the implementation with six fixed 
vectors at very low bit rates. 
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Figure 2: Effects of the number of fixed vectors on the 
performance of the SPT. 


method 

f.garden 

lab 

room 

DC-RC 

24.80 

32.91 

29.19 

SPT-RC 

25.84 

33.40 

30.24 

VDC 

24.00 

30.10 

24.72 

WDC-RC 

25.43 

33.15 

29.65 

WSPT-RC 

26.07 

34.00 

30.55 


Table 2: Rate-distortion performance of the stereo im¬ 
age coding algorithms. The PSNRs of the encoded im¬ 
ages (in dB) are presented at a fixed bit rate (O.lSbpp 
for the lab pair, 0,16bpp for the flower garden pair, and 
0.15bpp for the room pair). 

gorithms at low bit rates (around 0.15 bpp) are in¬ 
vestigated: Disparity compensated (n = 16) residual 
coding (DC-RC), VDC, windowed D^RC (WDC-RC), 
SPT compensated residual coding (n = 16, m = 16) 
(SPT-RC), W-SPT-RC. It is easier to code the resid¬ 
ual for windowed techniques. In fact, WDC improves 
the coding results both perceptually and in the rate- 
distortion sense. It is better practice to increase the 
bit rate for the residual than to increase the bit rate 
for the disparity field. However, slightly increasing the 
bit rate for the disparity field using the VDC method 
and then coding the residual field is a feasible solu¬ 
tion. Different implementations of SPT are superior to 
the DC-based techniques. Finally, the operational rate 
distortion curves for the DC and the SPT algorithms 
are compared for the “Lab” stereo pair. The SPT is 
implemented in two different ways as described in the 
previous section. For the first implementation of the 
SPT a 4-by-4 projection block size is used, no resid¬ 
ual is coded, the projection coefficients are quantized 



Figure 3: A comparison of the rate distortion perfor¬ 
mance of the stereo coding algorithms. 


using a subband coder, and a locally optimal bit allo¬ 
cation scheme is employed as described in [2], For the 
second implementation a 16-by-16 block size is used 
for both disparity estimation and subspace projection. 
The residual is coded using a subband coder. Finally, 
DC is implemented using 16-by-16 blocks for estima¬ 
tion. The residual due to the disparity compensation 
is coded with a subband coder. The rate-distortion 
curves are presented in Figure 3. For low bit rates, the 
first implementation of the SPT is 3 dB better than 
the disparity compensation and 2 dB better than the 
second implementation of the SPT. Another interpre¬ 
tation of this result is that we can achieve a bit rate re¬ 
duction of 60% over DC. For bit rates higher than 0.15 
bpp, the second implementation of the SPT is prefer¬ 
able. Note that the maximum PSNR achieved by the 
first SPT implementation is bounded from above since 
even without the quantization, the algorithm cannot 
achieve exact reconstruction. The original and coded 
images are presented in Fig 4-5, respectively. 

5. CONCLUSION 

This paper summarizes our recent work on stereo im¬ 
age coding. We propose a new coding technique based 
on subspace projection. The novelty of the approach is 
that the transformation matrix of the projection oper¬ 
ation adaptively changes to exploit the inherent stereo 
redundancy and non-stationary cross-correlation char¬ 
acteristics between the images of a stereo pair. In ad¬ 
dition, we used a combined transform-subband coding 
scheme that is very efficient for coding transform do¬ 
main coefficients. The subspace projection technique 
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Figure 4: Original right image. 


is appealing since its performance at very low bit rates 
was found to be superior to the standard stereo cod¬ 
ing algorithms. The proposed coder is flexible and can 
operate over a broad range of bit rates and does not 
require training. 
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ABSTRACT 

A block based motion estimation (BBME) strategy that 
uses blocks of adaptive size for encoding video sequences. 
A novel joint strategy for the vector propagation and the 
adaptive correction has been implemented, that allows 
both an efficient level-to-levei update of the motion field 
and an effective recovery from wrongly estimated vectors. 
The developed algorithm yields a spatial quantization of 
the motion field, which is locally variable and it allows to 
segment regions with uniform motion in big blocks. The 
method produces a reduction both in bitrate and in 
computational load by keeping the reconstruction quality 
nearly constant, when compared to the classical BBME. 

1. INTRODUCTION 

In video coding, motion compensation techniques predict 
future frames of the video sequence using a previously 
computed estimation of the objects’ motion. In H.261 
recommendations [1], as well in MPEG [2,3] and in some 
other video coding techniques, the motion field is 
computed blockwise, by a spatial quantization of the dense 
optical flow into a few displacement vectors associated to 
squared blocks. The estimation of each vector is usually 
performed by comparing the block of the future frame with 
every displaced block of the current frame that falls within 
a pre-defined search window (full search). The lowest 
distortion measure gives the searched displacement. 

In this paper, a new method that performs motion 
estimation for video coding is presented. The method 
allows to achieve a motion field that is spatially quantized 
with blocks of variable sizes, keeping on their inside 
constant motion activities, so that a large number of 
motion vectors are joined in big blocks. The estimation is 
performed by processing the motion at several level of 
resolution and by choosing, level by level on a local basis, 
the dimension of the block that fits the motion field. The 
initial estimation is achieved by a full-search BBME at the 
coarsest level of a multiresolution pyramid, whereas on the 
next levels a novel adaptive strategy performs only the 
necessary correction to the field propagated from the 
previous level. The innovative strategy propagates and 
fixes the motion field aaoss the levels as follows. Starting 
from the coarse spatial quantization of the motion field 
with large blocks achieved in the first level, it decides 
which regions require a finer spatial quantization with 
smaller blocks. For such regions, the strategy initializes 
new displacement vectors in the next resolution level and 


decides for each vector the required amount of correction. 
It is so possible to fit the correction for each vector, and to 
perform actions ranging from a robust error recovery to 
only a small correction of previously estimated vectors; the 
computational load and the number of noisy vectors are 
both reduced. For the regions where a denser spatial 
quantization is not needed, the blocks do not generate new 
blocks and tliey are just propagated increasing their size 
level by level, whereas the related displacement is refined 
in its value. This joint strategy of propagation and 
correction acts locally and yields a not-uniform spatial 
quantization of the motion field. With respect to the 
traditional BBME with fixed size blocks, these fewer 
vectors are enough for achieving nearly the same 
reconstruction quality, as they are placed according to the 
motion activities in the scene. Moreover, the 
computational load is reduced; at first because the number 
of vectors is reduced too, and then because for each one of 
them the number of computations is carefully tuned on the 
basis of the local field. 

2, HIERARCHICAL BBME 

Classical BBME has the serious drawback that it has a 
huge computational load; considering to divide an image 
with N and M square blocks of size B in the two spatial 
directions, and a search window 5 of ±W, the number of 
elementary operations (EOPs) required for full search is: 

Aj^ = MNB^(2W+I)^ 

When a CIF-formatted luminance picture is used, with 
5=16 and W=16, the number of operations grows to about 
A y=l.lxlO^. For this reason, the full search BBME cannot 
be easily managed for real time encoding of video 
sequences and fast search methods have been studied for 
increasing the speed of the estimation task. In recent years, 
several works have been presented that are based on a 
hierarchical approach for BBME [4,5]. Hierarchical 
melliods are well suited for motion estimation, as they give 
robust motion fields and they can lower by even two orders 
of magnitude the computational load of the fiill search 
BBME. Let us suppose to use a L-levels multiresolution 
representation (e.g., the Gaussian pyramid introduced in 
[6]). It includes the original image as the iO)-th level and 
its L-i approximations at a dyadic sequence of resolutions. 
At the (Lriyth level, the BBME is carried out with blocks 
of dimension B by using a full search in a window that is 
reduced by a factor 2^^ respect to the case of fixed 
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resolution (i.e., the search is done within At 

the coarsest level, the number of blocks in each spatial 
direction is reduced too by the same factor, resulting 
M2'^^ and N2'^*^ respectively. Thus, only 
elementary operations are required at the (L-l)-th level for 
the coarsest BBME and 4, for the corrections in each of 
the remaining levels (Ji-L-2,-,0). But, as tlie main 
displacement has been processed in the coarsest level, only 
a reduced offset V;<< W2'^ is used for Ai for it needs to 
process just scale errors, and the resulting number of 
elementary computations Ajj is greatly lower if compared 
to Af (i.e„ the value of the equivalent full search done at 
the full resolution). For V;=v invariant with the 

A^i = MNB^ (2 W2-^*^+l? 

Ai = M N 2-2‘{2 v+J)^ 

All = Ar I + Ai = 

= Al. 1 + (.4/3) MNBH2 v+1)^ (1- 2-2^+^) 

In the case of a hierarchical BBME for a CIF sequence, 
using a L=3 multiresolution pyramid and a correction at 
each level of v=2, the number of computations is about 
3.7x10® for a gain of 30, whereas the gain becomes 
65 using v=l, and 92 using b=4 and v=l. However, the 
great gain of the hierarchical BBME produce a result in 
the final estimation that is similar to the full search only if 
the first estimation is coherent to the true motion at that 
resolution, even if it is coarsely spatially quantized. 
Indeed, a block of size B at the (L-l)-th level is equivalent 
to a block of size 2-^+^B of the (0)-th resolution level. If 
the block contains several objects with different motions 
on its inside, the first estimation surely gives a vector that 
minimizes the error at the (L-l)-th level, but that requires 
large corrections on the next ones. In this situation, a 
small value of v cannot recover from the false estimates 
done in the earlier levels. On the other hand, a greater 
value for v allows a better correction but a lower 
computational gain. One solution is to improve the 
propagation of the motion field toward the next level of the 
pyramid. A vector in the (D-th level generates four new 
vectors in the (l-l)-th level, each one associated to a block 
of dimension B as its father on the previous level. Instead 
of replicating the value of the father vector (with a factor 
of 2 due to the resolution increase), in [7] the motion field 
is propagated by searching for the best initial vector 
among the neighboring vectors at the previous level. In 
this way, a bad propagation coming from the father vector 
can be recovered if the son block has motion activities 
similar to the ones of any spatially adjacent block. In other 
words, the high spatial correlation of the motion field is 
exploited for error recovery. 

3. PROPAGATION OF THE MOTION FIELD IN 
HIERARCHICAL BBME 

Hierarchical BBME methods allow to process motion 
activities at different scales, and they are well suited for 
reducing the computational load. However, it is necessary 
that the initial coarse estimation is properly propagated 
and corrected in the finer levels. Moreover, the 


introduction of variable size blocks in BBME helps to 
reduce both the computational load and the number of the 
vectors of the motion field that need to be coded. Thus, a 
hierarchical BBME scheme that aims at reducing both the 
number of vectors and the computational load, without 
reconstruction degradation, must have the following 
properties. At first, it must be able to recover from earlier 
incoherent estimations. That can be achieved by a proper 
choice of the vector field that is to be refined at each level, 
and by setting adaptively tlie correction window on the 
basis of the required correction. Then, the improvement of 
the spatial resolution of the motion field should stop when 
smaller blocks do not take a substantial improvement on 
the reconstruction. These properties better the estimation 
and the coding performances. Indeed, the propagation stop 
allows to have less vectors to code, whereas a good joint 
propagation and correction strategy allows to achieve 
smooth motion fields with fewer noisy vectors, so that the 
entropy coding of the field may give better results. 

The used multiresolution representation is the L-level 
Gaussian pyramid [6], already introduced in Section II. 
Here, the first BBME is performed at tlie (L-l)-th level 
with B-sized blocks and tlie resulting motion field is 
evaluated block by block for its propagation toward the 
next level and for its correction. The propagation and the 
correction of the motion field is performed until the finest 
resolution is reached at the (0)-th level. Let bg f.i>f) be the 
generic B-sized block at the (t)-th level; there are M2-' and 
N2-' blocks in the two spatial directions, thus ie[l,-,N2- 
'] and je[lr-,M2'‘). In case the compensation for the 
block feB./(i,7) is not satisfactory enough, the block 
generates four new blocks at the next level where their 
displacements are corrected on the basis of their 
necessities. These four B-sized blocks encloses the same 
area of the scene as bgj(i,J) does, because they lie on the (I- 
l)-th level where the scale is doubled, and they allow to 
improve the estimation by means of a finer spatial 
quantization. 

Otherwise, if the compensation of b^JJ^,j) is satisfactory, 
the displacement is projected directly on the (0)-th level, 
where it is applied to a block of dimension B2+^ and the 
estimation for the area it encloses is stopped. 

3.1. Adaptive Size Block Generation 

Each block b^ iii,]) is made up of four sub-blocks of size 
B/2 at the same level: bg/2i(2i+l,2j), 

bB/2,ii(2i.2j-tI), and ,;(2i+i.2;+i). The MAD of the 
comipensation is evaluated for each of these by using 
dg^i,/), i.e., the same displacement vector of the block 
bgiihj). If at least one of them has a MAD that results to 
be higher than a fixed threshold Tg, the block bg /iUj) is 
split and generates four new blocks at the (l-l)-th level that 

are: bg j.i(2i,2j), bg i.i(2i+l,2J), bg i.j(2i,2j+l), and bg,_ 
j(2i+I,2j+l). The new blocks are given four adequate 
initial displacement vectors that are the starting position 
for the correction of the motion field in this area. The split 
and update process is iterated across levels (see Fig.l), 
until the compensation error given by stopping the spatial 
segmentation of the motion field with the block bg/j,]) is 
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satisfactory. This condition should be checked at the {Oyth 
for every block, by displacing the block b 2 iB^o{iJ) of 
dimension that is the projection on tlie finest 
resolution of However, for reducing useless 

computations, the {0)-th level compensation for verifying a 
propagation stop is performed only for a block tliat reveals 
good reconstruction at the level. This block should 
contain a constant motion for every pixel and tlie 
reconstruction error should be uniformly distributed in its 
inside. The MADs computed inside its sub-blocks are 
used, for they do not give further computations as higher 
order moments would need. In conclusion, the MADs of 
the sub-blocks are compared to the threshold, and if all of 
them are lower than the block is projected on the finest 
resolution level because there is an hypothesis of 
propagation stop tliat is to be checked. 


CURRENT FRAME 


FUTURE FRAME 






UNSATISFACTORY 




m 



1 


L_ 


SATISFACTORY UNSATISFACTORY 



DISPLACEMENT 



PROPAGATION \ 

CORRECTION 



CORRECTIONS 


Figure 1. An example of the adaptive size BBME. 

To perform this control, the vector is scaled by a 

factor 2^, and the compensation is evSuated with the best 
vector dji^Qihj) that lies within ±2^'^, that is a resolution 
window that allows to reach the pixel accuracy at the (0)- 
th level (see Fig.2). Within the resolution window, the 
MAD results to be a monotonic function of the 
displacements; thus, the global minimum can be reached 
by using a fast search method for BBME. In our work we 
have used a modified version of the algorithm proposed in 
[8], that allows to find the best displacement computing a 
reduced number of EOPs. If the compensation is 
satisfactory, the block b^jiiJ) is not further split and just 
one vector is used for its displacement field instead of 2^^ 
vectors; this fact gives a considerable gain in coding 
efficiency, since ••• + = {4/3) [4^-1] vectors are 

avoided by stopping to propagate a block at tlie {[)4h level. 
The mechanism allows to create blocks of different 
dimensions, which adaptively cover the future frame, 
obtaining a higher vectors’ concentration where a more 
dense displacement field is needed. However, since the 
motion field is not spatially quantized with a regular block 
disposition, it is necessary to code the position of the 
blocks inside the future frame. The propagation of the 
blocks within the Gaussian pyramid is equivalent to a 
quad-tree growing process, whose root is the father block 
at the (Lrl)-th level. This helps to code very compactly 


tlie side information needed to place the blocks; it is just 
necessary to code with one bit each split decision and, 
based on such information, it is possible to reconstruct the 
quad-tree and place the blocks on the future frame with the 
related vectors so that the decoder can perform the 
compensation. 


CURRENT FRAME 



SCALED DISPLACED BLOCK 



WITH CORRECTION DUE TO RESOLUTION INCREASE 


FUTURE FRAME 



Figure 2. Example of scaling from the 2-nd to the {0)-th level 
with relative resolution window for testing the compensation 
quality. 


3.2. Vector Propagation Strategy 

The propagation of a vector from tlie generic {D-th level is 
different whetlier the related compensation is satisfactory 
or not. In the first case the vector is at first expanded on 
the {0)-th level, and then the best d 2 i^Q{uj) within the 
correction window of Fig.2 is taken. In the other case, four 
new vectors are propagated to the {Ul)-th level as follows: 
(i) before beginning the correction phase, each vector 
among the set of its neighbors at the {l)-th level is scaled to 
the {l~l)-th level, and it is used for compensating the 
associated block; the vector with the lowest MAD is 
chosen as the starting point; (ii) during the correction 
phase at tlie {l-iyth level, the vector previously initialized 
is again compared to its neighbors on the same {Ul)'th 
level, that have been already corrected. The one giving the 
lowest MAD is taken. The first step allows to exploit the 
neighborhood of the motion field computed on the {l)-th 
level for recovering from a wrongly estimated motion 
vector. If a 5-sized block of the {l)-th level contains some 
objects with different motions, the estimated vector tliat 
minimizes the MAD cannot be the real motion. In tliis 
case, if a sub-block contains just one of these objects and 
an adjacent block is completely covered by the same object 
(so that its displacement estimation is correct), then, the 
sub-block could start from the true displacement for its 
motion correction in the {l-I)-th level. The second step, 
allows to start with more accurate initial vectors if the 
neighbor blocks at the same level have a similar motion. 
Indeed, it is necessary to perform the correction just for the 
first block that covers an area with constant motion (e.g., 
an object moving against a still background) and then the 
same correction is taken by its neighbor blocks that cover 
the same area. 
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Figure 3. At first it is chosen the initial vector from the block C 
in the previous level as the best among those of blocks {A, B, C, 
D}. Then, it is chosen among the already refined neighbor 
vectors. 


3.3. Adaptive Size Setting for the Search Window 

With an initialization strategy as the one seen in the 
previous Sub-Section, the initial vectors are as much as 
possible near (or even equal) to the target vector. Thus, to 
reduce both the computational load and the possibility to 
fall in a false displacement, it is necessary to set the 
correction window so that its size is proportional to the 
amount of required correction. For instance, if the MAD of 
the initial vector is large (i.e., the compensation is not 
satisfactory enough) a small correction may be not enough 
and even a new estimation may be required, but if only 
widi die initialization phase tlie compensation is 
satisfactory, it is not necessary to search any more. In 
order to modulate between these two extreme situations, 
the width of the search window has been set on the basis of 
the compensation MAD of the initial vector. In particulm 
the highest value of the MADs of the four sub-blocks is 
considered. If the initial MAD has a value lower than the 
threshold T^j, then, the hypothesis for a propagation stop is 
fulfilled, a further correction is not needed anymore and 
the search window is set to zero. Otherwise, if it is bigger 
than an upper threshold Tjj, a simple correction is not 
enough and a new estimadon is performed: the initial 
vector is set to zero and the window size is set to the 
maximum displacement allowed at that level, i.e., i W2 . 
In all other cases the correction window size is set 
proportional to the initial MAD value. By applying this 
strategy, it is possible to recover from bad estimations or to 
perform only small corrections on already satisfactory 
estimations, by using a search-window width which is set 
adaptively. 


4. EXPERIMENTAL RESULTS 

The proposed adaptive BBME allows to reach the same 
reconstruction quality of the full search BBME, with a 
reduced number of vectors. In Fig. 5, we show the results 
for the standard CEF sequence Salesman. The full se^ch 
BBME has been performed with block size B=\6 within a 
search window of VF=i 16. Moreover, for the hierarchical 


BBME, the number of the levels of the Gaussian pyramid 
have been set to L=3, the minimum block dimension is 
5=16 and the initial search windows at the coarsest level 
to ±4. The hierarchical method is able to exploit the same 
displacements of the full search and it allow blocks of 
dimensions 5p=16, Bj=32 and 52=64. For Tj^ and Tjj two 
settings have been chosen, so tliat different performances 
are obtained. The first allows to achieve a reconstruction 
quality as similar as possible to the one of the full search, 
but with a reduction in the number of the used vectors. The 
latter setting reaches a higher decrease in the number of 
used vectors, but with a reconstruction quality that results 
to be a little lower to the one of the full search. The 
diagrams show the comparative values for the PSNR of the 
achieved compensations, and the entropy coding of the 
motion field achieved using adaptive arithmetic coding 
[9]. PSNR values are very similar for each considered 
compensation, whereas the bitrate for the adaptive method 
is lower even by a factor two respect to the full search 
method. Finally, the compensations between the 2nd and 
the 4th frame of Salesman are shown. It is reported the 
compensated 4th frame, its DFD with an offset of 128 and 
quantized with 1 bpp, and the relative motion field. The 
full search is compared to the adaptive method with two 
different settings of the thresholds. The reconstruction 
quality is nearly the same for the three compensations, but 
the number of vectors and the bitrate is greatly reduced for 
the adaptive BBMEs. Moreover, the method gives a gain 
in computational load that reaches even a factor 100, with 
L=3 levels of the multiresolution representation, that can 
be compared with the factor 65 achieved in Section. 
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Figure 5. In (a, b, c), the diagrams show the PSNR of the achieved compensations, the bitrate for coding the motion field 
and the computational load (for full search there are 99,847,168 EOPs). The reconstructed frames, the DFD images and the 
needle diagrams are reported for the compensation between the 2nd and the 4th frame of Salesman, The full search BBME 
(d, e, f) gives a PSNR=37.19 with 396 vectors coded with 0.0157 bpp. The proposed method allows to reach the same 
PSNR=37.19 with 225 vectors coded with 0.0123 bpp (g, h, i). By a different tuning of the thresholds, with only 87 vectors 
and 0.0073 bpp it is possible to achieve a PSNR=36.93 (1, m, n). 
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ABSTRACT 

In this paper, an efficient algorithm for multiplexing 
video sources, using a dynamic bandwidth allocation 
scheme, is presented. Video sources are grouped into 
classes regarding different combined levels of spatial 
detail and amount of movement. Simulations are 
being performed and quality improvements have been 
obtained for sequences with higher local 
activity/motion. 

1. INTRODUCTION 

Digital technology and techniques are progiessively 
being introduced in TV broadcast applications. 
Among digital techniques video compression emerges 
by allowing the use of a significantly smaller 
bandwidth for the transmission of a fulLmotion video 
digital signal, while still providing a high-quality 
service. It is thus foreseen its use in the TV broadcast 
industry, in an economic and efficient way. 

MPEG [1][2] specifies the compressed audio and 
video bitstream fonnats and how to maintain 
synchi’onisation between the two. With the MPEG2 
video standard an almost constant-quality signal can 
be produced by allowing the bit rate of the 
compressed bitstream to vary. Therefore the video 


coder will produce more bits when encoding a more 
complex picture, in order to maintain constant the 
overall quality. 

Television channels will be MPEG encoded and 
transmitted. Considering the fact that they are 
statistically independent, it is very unlikely that 
encoders working in parallel on different video 
sources, will be dealing simultaneously with difficult 
scenes to encode. Or, in other words and applying this 
concept to TV broadcasting, it is very unlikely that all 
TV channels will be presenting at the same time 
scenes with the same kind of complexity and motion. 
The most probable situation is to have distinct TV 
channels showing different kinds of programmes. 
Some will be presenting sporting events with more or 
less motion, while others will be transmitting news 
programmes, documentaries, movies, talk-shows, etc. 
Thus by allowing individual encoders to generate 
variable bit rates, transmission bandwidth is 
dynamically allocated to each channel according to 
their individual needs [3][4]. The entity responsible 
for multiplexing the different TV channels in one 
single transmission link, will automatically assign 
more bandwidth to a channel when it detects a great 
amount of complexity in the scene being encoded. 
This will allow a more efficient use of the total 
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available transmission bandwidth, while maintaining 
nearly constant quality across all sources. This scheme 
can be applied to satellite or teixestrial broadcasting as 
well as ATM transmission. 
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The paper is organised as follows: section II presents 
an algorithm to classify the video souices and to 
dynamically allocate the total bandwidth for each 
channel regarding its needs, section III refers to the 
associated system and network problems and in 
section IV some simulation results are presented. 
Finally, section V, discusses the results and future 
work. 
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p and BW^^ g are bandwidth for I, P 

and B pictures for a video source within a GOP. 
Kp = 1.0 and = 1.4 are constants dependent on the 


2. DYNAMIC BANDWIDTH ALLOCATION 
ALGORITHM 

The dynamic bandwidth allocation algorithm consists 
in 4 steps and is applied during a picture period. 

In the first step, the reference bandwidth (BWref) of 
each video source is detennined based on the total 
available transmission bandwidth, the picture coding 
complexity and type, GOP structure of each video 
source and the current state of the total virtual buffer. 
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quantisation matrices [5] and and Ng are the 

numbers of P pictures and B pictures remaing in the 
current GOP in the encoding order. R is the 
transmission bandwidth allocate to the channel during 
one GOP and is the total number of pictures in 

the GOP. bit rate is the ratio between the total 
available transmission bandwidth and the number of 
sources. X variables are the “global complexity 
measure” of the different pictures types. They are 
updated by calculating the product of the number of 
bits generated by encoding a picture and the average 
quantization parameter (computed with the actual 
quantization values used during the encoding of all 
macroblocks, including the skipped ones) for each of 
the different pictures types. 

In the second step, the estimated bandwidth is 
determined for the optimal distribution of the total 
available transmission bandwidth according to picture 
coding type and video soiuce complexity. A measure 
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of the picture complexity is obtained by computing 
the average value of spatial local activity. For the 
macroblock j, spatial local activity is measured from 
the four luminance frame-organised sub-blocks and 
the four luminance field-organised sub-blocks, using 
the original pixel values: 

actj = 1 + ,(var_ sblk) (6) 

where 

= (7) 

^"-^ 2 “''’* <*> 

and are the pixel values in the original 8x8 block. 

The complexity of the scene being coded can be 
estimated from the complexity of the difterent frame 
types in the GOP: the spatial complexity of the I 
frame and the motion, which determines the 
complexity of the P and B frames. 


Class 

Content Complexity 

Video test material 

A 

low spatial detail & low 
amount of movement 

mad, akiyo, hall 
monitor, container 
ship, sean 

B 

medium spatial detail & 
low amount of movement 
or vice versa 

foreman, news, 

silent, coast guard 

C 

high spatial detail & 
medium amount of 

movement or vice versa 

bus, table tennis, 
Stefan, m&c 


Table 1) Classification regarding content complexity 


The sequences used in the simulation are divided 
according to their content complexity (table 1) 
measured through the mean and the variance values of 
the local activity. 


In the third step, the available bandwidth is allocated 
to each video source by considering the estimated 
bandwidth. 

- ( 9 ) 

ILbw,,, 

i=\ 

where n is the number of video sources. 

Finally, in the last step, a reference value for the 
quantization parameter is determined for the picture. 
The reference value of the quantisation parameter is 
then modulated according to the spatial activity in the 
macroblock to obtain the value of the quantization 
parameter, mquant, that is used to quantize each 
macroblock. 

3. NETWORK AND SYSTEMS ASPECTS 

The implementation of the proposed bandwidth 
allocation scheme for multiplexing several MPEG2 
video sources in broadcast applications, presents some 
challenges and raises a number of problems which are 
presently being addressed. 

One of such aspects concerns the fact that individual 
encoder sources and multiplex buffer are physically 
apart. This implies that a protocol must be defined so 
that communication links may exchange valuable 
infonnation between them. The protocol should take 
in account several requirements such as the quality of 
service imposed by the system (minimum delays and 
losses, procedures to add/drop new TV channels,...). 
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service imposed by the system (minimum delays and 
losses, procedures to add/drop new TV channels, ...)• 

To study these problems, a simulation framework has 
been defined so that different protocols, in an ATM 
network, and its feasibility may be tested. A software 
distributed environment is being implemented 
allowing video encoders, in different machines, to 
communicate with the multiplexer. 

Apart from the problem of solving in an efficient way 
the exchange of control information, it is also 
necessary to define the type of information and the 
actual place where that infonnation is to be conveyed 
from encoders to multiplexer - it may be transmitted 
embedded in the MPEG2 flows or exchanged using a 
separated connection. Experiences are being 
performed in order to asses the best solutions for all 
of these matters. 

4. SIMULATION RESULTS 

Simulations were performed using MPEG video test 
sequences as “Bus”, “Foreman” and “Akiyo”. They 
were grouped into three different classes, each one 
exhibiting different combined levels of spatial detail 
and amount of movement (table 1). Each sequence 
consist of 124 frames (352x288 pels). We began our 
studies with 2 different video sources, from different 
classes (table 2), and progressively add more sources 
(table 3). The last line in table 2 & 3 (independent) 
refers to nonnal CBR video encoding (at 1.5 Mbps) 
and transmission. 


Analysing table 2 and 3 we can observe that class C 
sequences (high spatial detail & medium amount of 
movement or vice versa) present the higher gains 
being nevertheless still the sequences with inferior 
quality. This gain is achieved at the cost of a reduction 
in class A sequences. 


Class A 

Class B 

Class C 


35,8 

37,4 

n.a. 

AB 

31,9 

n.a. 

22,6 

AC 

n.a. 

32,0 

21,9 

BC 

37,8 

34,6 

20,0 

Independent 


Table 2) SNR for multiplexing 2 video sources (dB) 


Class A 

Class B 

Class C 


32,5 

33,4 

23,6 

ABC 

26,4 

36,5 

n.a. 

AAB 

35,2 

35,9 

n.a. 

ABB 

26,2 

n.a. 

24,6 

A AC 

30,0 

n.a. 

21,6 

ACC 

n.a. 

36,5 

24,6 

BBC 

n.a. 

33,9 

24,3 

BCC 

37,8 

34,6 

20,0 

Independent I 


Table 3) SNR for multiplexing 3 video sources (dB) 


Simulation results shows that uniform picture quality 
is maintained between different video sources when 
multiple video sources are multiplexed. 

5. DISCUSSION 

This paper presents an algorithm for dynamic 
bandwidth allocation which allots the available 
bandwidth according to the needs of each video 
source. Video sources are grouped into three different 
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classes, each one exhibiting different combined levels 
of spatial detail and amount of movement. Simulation 
results show that bandwidth gains/quality 
improvements are more significant when 
heterogeneous sources are multiplexed together, 
specially when video sources from classes C and B 
are present. In our future work we will include video 
sequences with shot changes, sequences not GOP 
aligned and with variable GOP sizes. Further work 
needs also to be done on the implementation of the 
protocol to improve the management of the 
communication between video encoders and 
multiplexer. 
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Abstract 

Multimedia product and 
applications pose particularly 
difficult challenges for user interface 
designer. One of this tasks is the 
development of transcoding units. 
The general aspects of transcoding 
Recent advances in computing 
and communication technologies 
promise to create an infrastructure 
in which computer systems will 
support a wide range of interactive 
multimedia services in a variety of 
commercial and entertainment 
domains. Research and development 
efforts in multimedia computing fall 
into two groups. One group 
concentrates on the stand-alone 
multimedia workstation and 

associated software systems and 
tools. The other combines 
multimedia computing with 

distributed systems. Potential new 


procedure in multimedia system are 
investigated in this paper. The 
related field is the development of an 
educational multimedia system. 

1. Introduction 

applications based on distributed 
multimedia systems include 
multimedia information systems, 
conferencing systems, on-demand 
multimedia services, and distance 
educa-tion. In its simplest 
configuration of distributed 
multimedia system, the architecture 
will comprise of multimedia 
information server connected to 
clients via networks. Clients will 
dial-up the server and request the 
retrieval of information objects 
(consisting of audio, video, text, 
imagery, animation, etc.) stored at 
the server. Distributed multimedia 
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systems require continuous data 
transfer over relatively long periods 
of time, media synchronization, large 
storage, special indexing and 
retrieval techniques. 


2. Transcoding 

Among all the abovementioned 
data types, the amount of digital 
video that is available has increased 
dramatically in the last few years. 
Several image data compression 
methods and standards (JPEG, M- 
JPEG, H.261, MPEG-..)were created 


during the last decades for effective 
manipulation of the huge amount of 
such data. The several digital video 
sources involve several image 
formats and compression standards, 
therefore it is necessary to solve the 
conversion of any format and 
standard into any other (Fig. 1.). 
This conversion is named 
transcoding procedure. Naturally, 
this conversion is very complex 
problem and general solution not 
exists at present. In this 
presentation, we will concentrate on 
some partial solution of this field. 


decoder units: 


encoder units: 



Fig. 1 Block scheme of transcoder method 


In our work, the main purpose is 
to determine the design and 
implementation aspects of the 
transcoder unit. The 

abovementioned standards involve 
the following processing units: DCT, 
quantizer, motion-compemsation, 
lossless coding, etc. Previously, we 
developed simulational software 
(DCT, wavelet, DPCM, 

blockmatching motion- 

compensation, etc.) for the 
investigation of these units. At 


present, we are developing 
simulational software for the 
investigation of the transcoding 
procedures using some earlier 
results. Our main task is the 
development of an educational 
distributed multimedia system and 
also an healthcare distributed 
multimedia system in which the 
clients are multimedia PCs. The 
transcoding is a critical part 
involved into this work. 
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inverse quantizer and an inverse 
DCT. 

Most of hardware-based JPEG 
codecs support only the 4:2:2 format 
decimated video, and the H.261 uses 
the 4:1:1 format decimated video, 
one possible solution of transcoding 
is the subsampling of the JPEG 
chrominance components vertically 
by two. 


3. Implementation aspects 

Naturally for implementation we 
need high-speed compression tools, 
or codecs (compression/ 

decompression devices). Software 
solutions - even in the case when 
running on parallel platform - offer 
poor results not satisfying most of 
the real-time requirement. 

Hardware solutions for accepted 
standards are provided by 
specialised inexpensive VLSI codec 
chips or chip families. However, due 
to their specialised functions, they do 
not provide the flexibility that may 
be highly desirable for advanced, 
much demanding applications with 
various or novel compression 
methods or just for testing the 
different compression algorithms. 
Instead of the rigid structure of the 
specialised codecs flexible 
dynamically (re)configurable fast 
hardware structures would be 
favoured in these applications. 

For low-level image processing, 
motion analysis and image coding 
purposes the methods and 
techniques usually include a few 
processing steps or signal processing 
operations/functions, (like 

convolution/filtration, discrete 

transform, and inverse transform, 
effective coding methods, special 
effect calculations, format 
transforms, clustering pixels, vector 


The transcoding procedure is 
more complicated in the case of 
other standard pairs (e.g. JPEG - 
MPEG). 

In the first phase of our work, we 
concentrate on the solving of 
transcoding procedure involving 
only the JPEG - H.261 pair encoding 
algorithm. 

quantization operations, etc.). These 
steps have to be chained or pipelined 
after each other, while the digital 
image data flow through the chain 

and thus the image gets transformed. 
In transcoding operation the type of 
functioning is similar at both sides, 
encoding and decoding. 

The available and future VLSI 
FPGAs (Field Programmable Logic 
Gate Arrays) are well suited to form 
add-on parallel coprocessor systems 
for workstations, PCs, video/graphic 
tools. Their internal logic structure 
may be defined/configured (tuned to 
the needs of the algorithms) via 
downloading the configuration data 
(pattern) to their internal 
configuration RAMs. Novel 
developments in the fields of FPGA 
chips consider signal/image 
processing needs, offering very 
flexible fine-grain internal structure 
for the chips. For development 
purposes and also simulation of the 
behaviour of the intended internal 
logic convenient sophisticated 
software tools are available to 
support the configuration design [8]. 
Since the configuration and its 
operation are defined by the 
configuration pattern, downloaded 
and stored in internal RAMs, the 
structure can be dynamically 
(during run-time) changed or 
reconfigured, simple via 

downloading a new pattern. Novel 
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We can investigate the general 
problem of transcoding procedure 
only In limited extent such as a given 
matched decoder/ encoder pair. The 
main bottleneck in this problem is 
the large number of operational 
steps such as DCT computations and 
memory operations. From this point 
of view, first the relative simpler 
versions such as JPEG/H.261 and 
M-JPEG/H.261 pairs are under 
investigation using simulations. 
Later, it will be involved also the 
MPEG standard which is the most 
perspective general standard. Most 
timeconsuming operation is the 
processing of transform coefficients, 
therefore the separable version of the 
2D DCT will be applied. Frame 
memory is needed for the 
intermediate processing. 

In general, many 2D DCT 
algorithms have been proposed to 
reduce the computational complexity 
and to increase the operational 
speed. These algorithms can be 
divided into two groups, the row- 
column method and the direct 2D 
method. The row-column method 
computes the 2D DCT by applying 
the ID DCT on the rows of the input 
image data frames, storing the 
transformed results in an 
intermediate matrix, transposing the 
matrix, and performing the ID DCT 
again on the columns of the 
transposed matrix. Since there exist 
many ID DCT algorithms, there are 
also many realizations for the row- 
column method. 

In the case of JPEG, the input 
image is first divided into 
nonoverlapping blocks. Each block 
has 8x8 pels. Each block is 
transformed into the frequency 
domain by DCT. The standard does 
not specify a unique DCT algorithm. 
Consequently, users may choose the 
algorithm that is best suited for their 


The details of this work are image 
format or resolution conversion, 
chrominance signal subsampling, 
intermediate processing for decoded 
DCT components, processing of 
intra- and inter-blocks of GOP 
supposing H.261 coding etc. The 
received input signal first will be 
decoded, then its components will be 
mapped into intermediate format in 
the transcoder unit, and finally the 
“new” coded signal will be created 
from the intermediate format. One 
of the most 

applications. The DCT coefficients 
are quantized and entropy coded. 

In the JPEG decoder, after 
extracting the coding and the 
quantization tables from the 
compressed bit stream, the 
compressed data passes through an 
entropy decoder. The DCT 
coefficients are first dequantized and 
then translated to the spatial domain 
via an inverse DCT. After a block- 
to-raster translation, the image is 
fully decoded. 

The H.261 encoding algorithm - 
like the MPEG - uses a combination 
of DCT coding and differencial 
coding. The main elements of a 
H.261 encoder are frame prediction, 
DCT transformation, quantization of 
transform coefficients and variable 
length coding. The DCT coding path 
is similar to the one used in JPEG 
and MPEG. Similarly to the 
operations in JPEG, the DCT 
operates on 8 x 8 picture blocks. 
Four luminance (Y) blocks and one 
B - Y and one R - Y colour 
difference block are combined to 
form a macroblock. 

In the H.261 decoder, the 
compressed input is buffered and 
processed by the variable length 
decoder. The decoded data are 
parsed and then processed by an 
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FPGAs allow (e.g. ATMEL 
AT6000) allow partial 
reconflguration, to speed up the 
process. A single cell operation may 
be changed in 200 ns. 

The use and development of 
reconflgurable structures for 
sophisticated high-speed multimedia 
and image/signal processing 
applications is an emerging new 
research area. More and more 
efforts are concentrated to questions 
like what are the application 
requirements, how they can be 
solved using reconfiguration, how 

4. Conclusion 

Our work is involved in the 
development of an educational 
multimedia system based on the 

References 

[1] ACM’95 Proceedings 

[2] Borko Furcht, Stephen W. 
Smoliar, HongJiang Zhang, 

Video and Image Processing in 

Multimedia Systems 

Kluwer Academic Publishers 1995 

[3] D. Minoli, R. Keinath, 

Distributed Multimedia Through 
Broadband Communication Services 
Artech House 1994. 

[4] ITU-T Recommendation H.261: 
Video codec for audiovisual services 
at p X 64 kbit/s 


the task partitioning and then 
control pattern downloading affect 
the computational overhead, what 
performance measurement/ 

evaluation consideration should be, 
etc. As for up to date the early 
results and publications, however, 
agree in that this novel field of high¬ 
speed sophisticated systems is a 
promising one. 

At present, the development of 
FPGA units is going on based on 
simulational results, and parallel 
with this, we are working on the 
evaluation method of the new units. 


server-client model. The work is in 
initial phase, first simulational 
software were developed for 
verifying our algorithms. 


[5] B. Furht, M. Milenkovic, 

A Guided Tour of Multimedia 
Systems and Applications 
IEEE Computer Society Press, Los 
Alamitos, California, 1995. 

[6] M-JPEG Option, Handbuch 
FAST - Fast Electronis GmbH, 1994 

[7] Digitalvideo Computing GmbH: 
MPEG Development Kit Reference 
Manual 2.0, 1995. 

[8] SPECTRUM (TM) 

Reconflgurable computing platform. 
Giga Operation Corp. Ca. USA 


74 



HUMAN FACE RECOGNITION: AUTOMATIC FACE DETECTION 


G. Marcone"^, A. Fusi"^, G. Stoppani^ and G. Orlandi^"^ 


* Fondazione Ugo Bordoni, Roma, Italy 
** University of Rome “La Sapienza”, It^y 
Tel. : +396 5480 2135; Fax: +396 5480 4401 
e-mail: gmarcone@fub,it 


ABSTRACT 

In this paper a method to automatically locate the head of 
an individual in a generic image is proposed. It is based 
on the hypothesis that the outline of a human head can 
be seen as an elliptical structure. The proposed method 
exhaustively evaluates all the possible ellipses associate to 
the edge map information and selects the ellipse that best 
fits the head depicted in the image. The results 
demonstrate the robustness of the method to variation of 
lighting, tilt and translations in the image plane. 


1. INTRODUCTION 

In recent years, the problem of human face identification 
has attracted considerable attention because of many 
applications of automatic face recognition systems (to 
control the access of security buildings, to enhance the 
security of the user authentication in ATMs, to improve 
the information security, etc...). However, the face 
recognition research [1] has been mainly focused on 
distinguishing an input face image from a database of 
known face images, while the task of detecting faces in 
an arbitrary background is usually carried out by either 
hand segmenting or capturing faces against a known 
uniform background. 

Face detection has direct relevance in the face recognition 
problem. In particular, identifying and locating faces in 
an unknown image is the first important step to 
implement a fully automatic human face recognizer. 
Moreover, face detection has potential applications in 
human-computer interfaces and surveillance systems. A 
face finder can make workstations with cameras more 
user friendly by turning monitors on and keeping them 
active whenever there is someone in front of the camera. 
The problem of detecting a human face within digitized 
images consists of determining whether or not there is a 
face in an arbitrary image, and in the affirmative case, of 
segementing the face region from non-face regions. 


returning the location and the spatial extent of the face in 
the image plane. 

There are various approaches to the problem of face 
detection: fixed templates approach [3]; defonnable 
templates approach [4] and the use of the spatial image 
invariance. In the first approach the difference 
measurement between a fixed reference pattern, or a bank 
of reference sub-patterns, and candidate image locations 
is computed; then the output is threshold for matches. In 
the second approach a deformable template is fitted to 
different parts of the image and the output is then 
thresholded for matches. The last approach is based on a 
set of spatial image relationships common to all the face 
pattern, and a check for positive occurrences of these 
invariance at all candidate locations is performed. 

In this work a computational methodology for detecting 
human faces in digitized images has been developed. It is 
inspired to the works [5] and [6] and uses an a-priori 
information: the roughly hypothesis, but quite verified in 
practical, that the outline of a human head can be seen as 
an elliptical structure. It can be roughly classified as a 
deformable template method because it is able to 
distinguish the face region by using a parametrized 
elliptical template. This template marks the boundary 
between the face region and the background. 

The edge map of the image is processed to get an high 
level description of the scene depicted in the image. This 
description is based on the main edge lines of the objects 
contained in the scene. The edge lines are then 
subdivided into edge segments. These segments are 
paired with other edge segments and fitted to a linearized 
equation of the ellipse. The parameters of the ellipse 
(centre point, semi-axes) are found by solving a 4x4 
system of linear equations. After all possible pairs of 
segments are considered, a set of corresponding ellipses is 
obtained. For each ellipse, the edge curves that intercept 
it are grouped, according to a suitable procedure. Then a 
classification task of the fitting ellipse set is carried out, 
according to a function that measures the fitness between 
the generic ellipse and the corresponding edge curves. 
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Further a selection procedure has been implemented to 
form the best fitting ellipse set with the corresponding 
edge curve sets. 

The best fitting ellipse is calculated which is the average 
ellipse of the best fitting ellipse set. A second ellipse is 
calculated resolving a linearized over-determined system 
of equations, that is obtained considering all the points of 
the edge curves corresponding to the ellipses in the best 
fitting set. 

Finally, a cost function based on measurements of the 
differences between the two calculated ellipse is defined 
to evaluate the quality of the detection task. This global 
parameter is suitable for an automatic procedure. 

The paper is organized as follows: Section 2 reports a 
detailed description of the used detection procedure. 
Section 3 deals with the analysis of the results. Finally 
Section 4 describes several experimental results. 


2. FACE IMAGE DETECTION ALGORITHM 

The detection algorithm is based on the edge map 
information of the input face images. 

The edge map is a low level description of the scene 
depicted in a generic input image. It contains information 
about the location, size and shape of the objects in the 
image. This information is used to accomplish the 
subsequent segmentation task, that usually separates 
objects of interest from the rest of the image content. 

In the image class considered in this work, the object of 
interest is the face of an individual depicted in the 
foreground. The corresponding edge map contains the 
outline and the features of the face. Further, the edge map 
contains also the edge information of objects that belong 
to the background. This information may give rise to 
misunderstanding in the segmentation task. Depending 
on the application the background may be uniform 
(without objects) or non-uniform ( with objects of various 
sizes and shapes). 



Figure 1. Intensity face image and its edge map from 
Canny’s edge detector 


In this work, the Canny’s algorithm [7] has been 
implemented to get the edge map of face images with 
non-uniform background. This algorithm has been 
optimized to evidence the head outline, discarding the 


edges corresponding to features not marked, beard, 
grizzled hair, etc.. 

Figure 1 shows an example of a generic input image and 
its Canny edge map. 

2.1 EDGE MAP PROCESSING 

The edge map is not able by itself to give a description 
based on the main lines of the objects contained in the 
image. It is needed to accomplish further processing to 
get the high level description suitable to the segmentation 
of face from the rest of the image. 

The first processing task consists of the removal of the 
intersection points. These points in the edge map occur in 
relation to occlusion of different objects. It is needed to 
remove these points in order to preserve the integrity of 
the edge lines belonging to the object in the foreground. 
The removal procedure used in this work is inspired to 
[5] and consists of disjoining the intersection of two edge 
segments not belonging to the same object. The removal 
task is carried out by multiplicating (logic and) the 
intensity of the points in the neighbourhood of an 
intersection point with a proper binary mask. 

The second processing step consists of researching and 
following the main lines of the object in the image. 

The line following algorithm marks all the contiguous 
points that belong to the same line. A generic line ends 
when there are not any more contiguous points not- 
marked. In the case of line with thickness more than one 
pixel, the algorithm extracts from it all the possible lines 
of one pixel thick. The result of the algorithm is a set of 
lines or curves c of 1 pixel thick. In the following this set 
will be referenced as curve set C. 

The third processing step consists of cutting each curve c 
of C into smaller and linear components (i.e. edge 
segments). This task may be seen as an approximation of 
a curve with a polyline. The result of the cutting 
procedure is a number of segments s corresponding to the 
curve c. All the segments s obtained for each curve c of C 
form the segment set S. 


2.2 FITTING PROCEDURE 

The ellipse fitting procedure take a pair of edge segments 
(Si, Sj) in S and tries to fit them into a linearized equation 
of the ellipse. The parameters of the ellipse are found by 
resolving a 4x4 system of linear equations according to 
the following procedure. 

Starting from the equation of the ellipse: 

(X-Xo)' . (y-Yo)' rn 

.2 U2 ^ 
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where (xo,yo) is the ellipse centre, a the horizontal semi¬ 
axe and the b vertical semi-axe. The equation (1) can be 
rearranged in: 

2xao -y^aj +2ya2 -Uj =x^ (2) 

where: 

a a 2^22 

ao = Xo , a, , a2 =-^yo » ^3 =Xo+—y^-a . 
bo b 

Substituting the co-ordinates of the extremal points 
(xi,yi) (X 2 ,y 2 ) and (X 3 ,y 3 )( X 4 ,y 4 ) respectively of the 
segment Si and Sj in the equation ( 2 ), it is possible to form 
the following system of equations: 


^ 2 X| 

-yf 

2 yi 

- 1 ^ 

f di 



2 X 2 

-y^ 

2 y 2 

-1 



^2 

2 X 3 

-y^ 

2y3 

-1 




^ 2 X 4 

-y5 

2y4 

- 1 ; 



X2 


After finding (al,a2,a3,a4), the parameters of the ellipse 
can be obtained according to the following formulas: 


Xq *“ ^0 Yo (^) 

^1 



The calculated parameters (3) are selected to describe a 
possible candidate ellipse when the centre, a and b are 
within the image plane, and the ratio b/a is bounded 
within a range of allowable values for most faces. 

After all possible pairs of edge segments (Si, Sj) in the 
segment set S are considered, a new set E of 
corresponding ellipses Cij is obtained. For exposition 
convenience in the following a generic ellipse ey will be 
referenced as e. 


2.3 GROUPING PROCEDURE 


• for each curve c of C, the number of points that fall 
within the area A are calculated. If these points are 
more than 70% of the overall curve points, the curve c 
is assumed as a component curve for the ellipse e. 

Figure 2 gives a detailed explanation of the above 
procedure. At the end of the grouping procedure for each 
ellipse e there is a corresponding component curve set 
C®=(C|t®;k=l,.., Ne) with Ne component curves. 



Figure 2. Ellipse e and its region A delimited by the 
internal and external ellipses respectively e*" and e°“*. In 
this case the component curve set C® is formed by the 
curves ci, C 4 and C 5 . 


2.4 ORDERING PROCEDURE 

At this point it is possible to classify the fitting ellipse set 
E according to a voting method. For this purpose a 
function f(e) that measures the fitness of the ellipse e to 
the edge curves is defined. The value of f(e) is assumed to 
be the sum of the lengths of the component curves in C®. 
According to the values of f(e) the ellipse set E is 
ordered, obtaining the ordered fitting ellipse set . 


For each ellipse e in the fitting ellipse set E the edge 
curves c that fall within a particular area around the 
ellipse are grouped, according to the following procedure: 
• for each ellipse e with parameters (xo,yo,a,b), two 
ellipses e*" and e®“‘ are defined. The first one with 
parameters (xo,yo>a(l-x),b(l-x)) is internal to e, while 
the second one with parameters (xo,yo,a(l+x),b(l+x)) 
is external to e. These two ellipses have the same 
centre of e and fix the boundaries of a particular area 
A around the ellipse e. 


2.5 SELECTION PROCEDURE 

From the above sets, it is now possible to define a suitable 

subset of ellipses E*“*‘, which is the best fitting ellipse set. 

This subset is formed according to the following steps: 

• step 1 ) select the first ellipse e of E®*"** with parameters 
(xo,yo,a,b). 

• step 2 ) calculate a view centre point (Xc,yc.)- In the 
case of the first ellipse the view centre is the ellipse 
centre (xo,yo). In the case of more ellipses the view 
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centre is the mass centre of the ellipses considered 
(see Figure3a). 

• step 3) calculate the coverage area . Starting from the 

view centre point; for each component curve the 
angle is determined (see Figure 3b), then the sum 
^ is performed, discarding the overlapping 

areas. 

• step 4 ) test if C is more than a threshold x (a 
percentage of the round angle). In affirmative case go 
to next step, otherwise take into account the 
component curves of a further ellipse, selecting the 
subsequent ellipse in E®*"**. 

• step 5) define the best fitting ellipse set . The set 
consists of M ellipses with component curves that 
give rise to coverage area of 60% of the round angle. 



Figure 3. Two examples are reported: a) the view centre 
point (Xc,yc) for the ellipses ei, e 2 , ea, 64 ; b) the angle ai, 
a 2 » Ota, ttk corresponding to the component curves ci, C 2 , 
C3, Ck. 


3. THE BEST FITTING ELLIPSE 


The solution to the detection problem is to calculate the 
ellipse El, which is the average of the M ellipses of E**“*, 
according to: 


(x.,y..a,b)= 


In order to evaluate the quality of the detection task, it is 
important to define a global function, which can be 
suitable for an automatic procedure. For this purpose a 
second ellipse £2 is calculated, considering the overall 
points of the N, component curves of the M ellipses of 
fibest co-ordinates of these points are substituted in 
the equation ( 2 ) obtaining the following over-determined 
system of equations: 


^2xi 

-y? 

2y, 




fxH 

2 X 2 

-y^ 

2y2 

-I 

ai 



i 

• 

: 

: 

^2 


: 

*^2Xn 

-y?. 

2y N 

-h 



.Xn; 


( 4 ) 


This system is of the form AX= C, where A is Nx4, X is 
4x1 and C is Nxl. It can be solved by using the pseudo 
inverse method: X = (A'A)’’ A'C. 

In the case of a curve belongs to different component 
curve set C', its points will be considered more times in 
the system (4). 

j 127 

Further a global quality parameter S„, 

has been introduced, which is the average value of the 
similitude trace function Tr. This function is based on 
measurements of the differences between the two ellipses 
and is calculated as in the following: 


• calculate the average centre (Xm,ym) of the two ellipses 
El and E 2 

• define a semi-axe Vk with origin in (Xm,ym) and 
orientation ^ = k*27i/128, k=0,127. 

• calculate the intersection points (xik,yik) (X 2 k»y 2 k) 
between Vk and the two ellipses Ei and £2 

calculate the function Tr(k): 

Tf (k) = +(yik“y2k) 

For more details see Figure 4. 



Figure 4. The similitude trace Tr of the two ellipses £1 and 
£2 is drawn for 0<k<127. For a given semi-axe Vk 
the intersection points (xik,yik) (X 2 k,y 2 k) and the value of 
Tr(k) are outlined. 

The similitude trace function Tr(k) provides useful 
information about the quality of the detection task. For 
low values of the two ellipses £1 and £2 give concordant 
results in terms of face detection. So it is possible to 
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assume that the detection task is of good quality (see the 
Sm values in Figure 5a and 5b). 

This information could be useful used in a fully automatic 
segmentation system: if the detection task has a global 
quality parameter Sm less than a given threshold, the 
subsequent segmentation task may be carried out. 


4. EXPEMMENTAL RESULTS 

The proposed segmentation method has been tested on 
the face image databases of the M.I.T. consisting of 
128x120 pixel images with variable imaging conditions 
(moderately cluttered background, variable lighting, etc.). 
In figure 6 some results of the segmentation procedure 
are reported. It is possible to see from the images in the 
row that the segmentation algorithm is robust to 
lighting variations. From the images in the 2"*^ and 3*^^ 
row the algorithm is proved to be further independent to 
head tilting. The algorithm is also independent to the 
location of the head in the image plane as it is shown in 
the 4*^ row. Finally, it features good performance also in 
the case of noisy face images. 


5. CONCLUSIONS 

In this work a detection algorithm for automatic face 
detection has been implemented. It differs from similar 
methods in the computing methodology to get ellipses 
that best fit the head outline. In particular 4x4 systems of 
linear equations based on edge segments have been used 
obtaining a wider ellipse fitting set. This set has been 
ordered according to a measure of the fitness. Then a best 
fit ellipse set has been formed from the previous set. Two 
ellipses have been obtained from the last ellipse set. 
Finally a cost function that measures the quality of the 
detection task has been introduced. This function is 
suitable for an automatic procedure. The method gives 
good results in terms of face detection with different 
experimental conditions. Further work have to be done in 
order to generalise the method, considering also ellipses 
with semi-axes orientation different from the Cartesian 
axes. The open problem is to evaluate the behaviour of 
the head detection algorithm when it is interfaced with 
the subsequent recognition task in a fully automatic 
recognition system. 
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Abstract 

This article presents a novel technique of crowd 
motion estimation using invertible rapid 
transform (IRT). The new method was used to 
estimate crowd motion from the image data 
sequences captured at railway station in large 
cities. 
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Introduction 

The understanding of crowd behaviour in semi- 
confmed spaces is an important part of the design 
of new pedestrian facilities and major layout 
modifications to existing areas and, for the daily 
management of crowds at football matches, pop 
concerts, carnivals, airports and even in the day to 
day movement of commuters in and out a targe 
cities is a substantial problem with serious 
consequences for human life and safety for public 
order if it is not managed successfully [1,2]. 

Human observers of crowds particularly those 
experienced in the management of crowds in 
public places, can detect many crowd features, in 
some cases quite easily. Normally they can 
distinguish between a moving and a stationary 
crowd and estimate the majority direction and 
speed of ttiovement of a large crowd. For 
facilities already in existence, there is an 
established practice of using extensive closed 
circuit television monitoring of crowds. Human 
observers normally positioned to watch the TV 
monitors of a such systems are not sufficient for 
obtain real time data by watching recorded video 
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sequences. There is thus a considerable benefit 
from being able to develop methods for 
automatically collecting crowd description data 
by use of image processing techniques applied to 
the video sequences [3, 4, 5]. These methods are 
based on well established image processing 
techniques and are able to monitoring and 
collecting data about key features of crowds: 
stationarity, density and motion. 

Particularly estimation of crowd motion is 
based on two well known methods: optical flow 
and block matching motion detection [3, 4]. The 
motion vectors calculated by these methods may 
be used to devise a polar plot (showing velocity 
magnitude and direction) for a moving crowd, 
where the dominant motion tendency of a crowd 
can be seen. Unfortunately these methods are 
rather complicated and not suitable for 
straightforward real time implementation. 

This article presents a novel technique of crowd 
motion estimation using invertible rapid 
transform (IRT) [5]. The new method was used 
to estimate crowd motion from the image data 
sequences captured at railway station in large 
cities. 

2. Invertible rapid transform 

The rapid transform (RT) [6] is a fast shift 
invariant transform well known in the field of 
pattern recognition. The RT has some interesting 
properties such as invariance to cyclic shifts, to 
reflections of the data sequence, and to slight 
rotations of a two-dimensional pattern [6, 7]. The 
RT is applicable to both binary and analogue 
inputs and can be extended to multiple 
dimensions [7]. Because of the recursive nature 
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of the calculations and the use of very simple 
operators it can be easily implemented in both 
software and dedicated digital hardware [8]. The 
RT was used in recognition of alphanumeric 
characters [6, 9, 10], robotics and scene analysis 
[10, 11]. Even though the RT is a nonlinear and 
thus a noninvertible transform it is possible to 
recover the original signal x from its rapid 
transform sequence RT{x}, computing additional 
data (known as matrix of states K for ID-RT or 
matrices of states for 2D-RT), i.e. the 

invertible rapid transform (IRT) can be defined 
[12, 13]. The IRT may be used for signal coding, 
motion estimation and nonlinear filtering [14]. 

Signal flow graph [12, 13] for compute of the 
ID-IRT is shown in Fig. 1. 


input output 

(r=0) (^1) (r^ (J^ 

xfOr^- .—t#xfOr 



x(j) x(j) 


Fig. 1. Signal flow graph of the ID IRT 


= if 

+ /^/2 ,y| - + Nft] + + NjlJ + > 0 

A‘/’(i-,y) = 0. if 

+ V/2 jj - + Nft) + x^'^i + N/2J + yv/2)| < 0 

if 

- x^'^i + N/2,j]j - + At/2)- + Ar/2J + At/2)| > 0 

/ci'>(i.j) = 0, if 

y)-+ At/2./|- \x^'%J + Nl2)-x^'\i + A//2,y + Af/2)| <0 

(2) 

where (r) is transform step of IRT and 
/j=0,l,...,(M2-l) 
r = 1, 2,..., n 
;2= 1,2,3,4 

The system of matrices of states Kp is 
illustrated in Fig.2. 
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Fig.2. The system of matrices of states 


The matrix of states K or system matrices of 
states jfiTp* may Be computed as follows. For one¬ 
dimensional case: 

k(i,r) = 0, if -xQ + N/iY''^ <0 

k(Ur')=\, if -x(i + N/2y^^ >0. (1) 

The dimension of matrix A” is « x N/2. For two 
dimensional case: 

kl'-^ =1, ifx^'-\i,j)-x^^\i+NI2,j)>0 
=0, ifx^'-\iJ)-x^^\i+N/2,j)<0 
=1, \£x^''\i,j+NI2)-x^''\i+NI2,j+NI2)>() 
itf =0, \^x'''-\i,j+NI2)-x^'-\i+NI2,j+NI2)<Q 


3. Motion estimation algorithm 
with use of IRT 

Motion estimation algorithms are based on 
presumption that in matrix K or m system A p is 
included relevant information about the picture 
[14, 15, 16] and the motion in picture influence 
the first column of A or the first set of matrices of 

Kp (i.e. jfiTi .Ki Kz ) m maximal way. 
Cyclical translations in the image are 
deterministically and imambiguously encoded to 
the values of this matrices. The matrices A or 
are binary matrices and can be computed 
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with use of simple and thus very fast algorithm 
using operations of comparison, addition and 
subtraction. This results in simplicity of the 
motion estimation subblock matching criterion 
computation, which is than based on the 
operations of bit by bit modulo2 additions [16]. 

First, the image is divided into smaller 
rectangular areas, which we call subblocks 
(Fig.3). Let U* be an NxN size subblock of frame 
k and U;t-i be equivalent subblock of frame ^-1. 
Let search area (SA) be an (N+2dm)x{N+2dm) 
size of frame k-\, centered at the same spatial 
location as U^t andUjfc.i is subblock from SA, 
where dm is the maximum displacement allowed 
in either direction in integer number of pixel. 


FRAME k-1 



Fig. 3: Positions of subblocks Ua, Um and SA at 
the frames k and ^-1 


3.1 Motion estimation algorithm with 
use of ID-IRT 

Let Kr and Kc are matrices of states computed 
by row and by column of subblock respectively. 

STEP 1: Compute Kc[k) for subblock 11*. 

STEP 2: Compute Kr(k.i), Kc{k-\) for subblock 

U*.,. 

STEP 3: Compute matching criterion 

N-1N/2-1 « , . 

(t/. v) = E Z X(KrW ('■ J) ® Kr^k-^) ('. y) . 

r =0 1=0 y=o ' ^ 

W-1A//2-1 u . 

L(Kcw{iJ)® KciK-^){iJ))> 

r =0 /=0 ;=0 

U,V € (-dm,dm) (3) 


where ® denotes bit-by-bit modulo2 addition. 

Repeat steps 2, 3 for every possible positions 
{u, v) of subblock U*.i in subblock SA {{2dm +1)^ 
cycles). 

STEP 4: The desired vector of motion 
correspond to the position (mq, vq) of subblock 
Uk-i with minimal value of o{u, v). 

Modifications of the alsorithm -1 
H -number of used columns of matrix K 
fj. e {0, 1 ,..., n-1} 

Modifications of the al 2 orithm - II 

4a, (mq, Vg) e [u, v}; (mq, v^) = min((y^^(u, v)) 

4b, (t/o, Vo) G {u, v};(«o, Vo) = min(cj^^,( m, v)) 

4c, (i^,Vo) e{M,v};a„,„(^„vJ=min(a^Jw,v))A 

A^col ’ ^o) = ^(GcoI («’ ^)) 

4d, (i/o,Vo) 6 {w.W:a.«(t^o.Fo)=min((y,«(u.\/)), 

(4) 

where a..,(«,v)=a^,XM,v) + cj„,(«>v) (5) 


3.2 Motion estimation algorithm 
with use of 2D-IRT 


STEP 1: Compute first set of JiTp * 

(i.e. Kil ). K^ 2 (k) • iSriJi). K^°ik)) for block U*. 

STEP 2: Compute first set of JSTp * 

(i.e. jK^i(L)-jK’z(Li). iri°i-i)>jK^4°i-i))forblockU*+i. 


STEP 3: Compute matching criterion 


T A// 2-1 


ct(u,v) = Z 

P =1 /=o y=o 


(C)('-y)®C-i)('-y)) 


( 6 ) 


u,ve {-d„,dj 

where ® denotes bit-by-bit modulo2 addition. 

Repeat steps 2, 3 for every possible positions 

2 

(m,v) of subblock U*.i in subblock SA ((2^4+1) 
cycles). 


Modifications of the alsorithm 
3a,z = 1 
3 b, x = 2 
3c,x~ 3 
3d, x = 4 
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STEP 4: The desired vector of motion 
corresponds to the position (wo, vq) of subblock 
Um with minimal value of g(m, v ), t.e. 

(uq.Vo) = (7) 

4. Crowd motion estimation 

The new method of crowd motion estimation 
was used to estimate crowd motion from the 
image data sequences captured at railway stations 
in large cities. For subsequent image frames (Fig. 
4) the subblocks displacement vectors using IRT 
was calculated. Then the vectors was used to 
devise a polar plot (showing velocity magnitude v 
and direction s) for moving crowd, with use of 
their aggregation to discrete direction 'bins' and 
with various bin size (Fig. 5). From these polar 
histograms the dominant motion tendency of the 
motion crowd may be clearly identificated. 
Irregular motion, movements of arms, legs and 
clothing and localized variations of brightness all 
cause errors in the computed motion vectors 
compared to the actual overall motion of the 
individuals in the crowd. This effect can be easily 
removed (filtered) from these polar diagrams and 
thus improved human and machine interpretation 
of crowd motion is achieved. The proposed 
experiments indicate that IRT motion estimation 
gives good results in terms of computation cost, 
speed and motion estimation accuracy. 

5. Conclusions 

This paper has shown that it is possible to use 
IRT for estimation of crowd motion. The method 
discussed is amenable to using simple operations 
suitable for real-time implementation. 
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ABSTRACT 

Flame images are a useful source of information to 
characterize combustion processes in industrial plants. 
However, the segmentation of flame images is difficult. 
For example, the intensity of the background is often 
higher than the intensity of the flame itself. This re¬ 
quires the use of a priori knowledge about the back¬ 
ground and flame characteristics. 

This paper presents a Bayesian approach to the seg¬ 
mentation of flame images. Flame boundary is esti¬ 
mated by the MAP method, using a probabilistic model 
of the image. Prior information about the contour 
shape is provided by a Markov random field. Robust 
estimation techniques are used to improve the perfor¬ 
mance in the presence of outliers. Experimental tests 
with industrial flame images from Setubal thermoelec¬ 
tric plant are provided to evaluate the performance of 
the algorithm showing that the proposed algorithm dis¬ 
criminates between flame and background. 

1. INTRODUCTION 

Flame images are a useful source of information about 
the characteristics of combustion processes in indus¬ 
trial plants. The use of this information in automatic 
monitoring/control systems requires the evaluation of 
the flame boundary. This is a difficult problem where 
classic image segmentation algorithms (e.g., threshold¬ 
ing, region growing [9]) fail. For example, in boilers 
with multiple flames, the intensity of the background 
is often higher than the intensity of the monitored flame 
since the background is aflectcd by the radiation of all 
the flames and some of them can be directly observed. 
These difficulties require the use of a priori knowledge 
about the background and flame characteristics. 

This paper describes a Bayesian approach to the 
segmentation of industrial flames based on Markov ran¬ 
dom fields [5] and robust statistics [3]. The flame bound¬ 
ary is evaluated by a MAP estimator. The posterior 
probability density function is computed by using an 
image formation model and a shape model (Markov 


random field). This approa(’h is insi)ire(l by tlu^ woik 
of Figuciredo and Lcitao [4] on the (estimation of v('u- 
tricular contours. These ideas are extended here by the 
use of robust statistics, a key issue on the application 
of image analysis techniques to real data. Probabilistic 
models adapted to flame images are also presented. 

The paper is organized as follows: Section 2 formu¬ 
lates flame boundary extraction as a MAP estimation 
problem. Section 3 and 4 present the image formation 
model and a probabilistic model of the flame Ijouiid- 
ary. Section 5 describes a flame segmentation algo¬ 
rithm based on MAP estimation. Experimental results 
are given in section 6 and section 7 presents the (*on- 
clusions. 

2. BAYESIAN APPROACH 

Let a; be a vector of parameters defining the boundary 
of a flame present in an observed image /. Assuming 
that X and I are random variables with known Joint dis¬ 
tribution, the maximum a posteriori (MAP) estimate 
of X given the image I is defined by 

xmap = argmax[p(a;|/)] (1) 

X 

The a posteriori probability p{x\I) is obtained using 
the Bayes law 

H.1/) = ,2) 

where p{x) is the a priori probability density function 
of the contour (prior), p{l\x) is the conditional j)roi)a- 
bility density function of the image, given the contour 
and p(/) is .a normalization factor. To compute x-map 
it is necessary to define the prior p{x) and the likeli¬ 
hood function p(/|a;), and to optimize their product. 

Figure 1 shows a typical flame image obtained in 
Setubal thermoelectric plant. Let us consider a set of 
L horizontal lines defined in the image and let Xi denote 
the abscissa of the intersection of the flame boundary 
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Figure 1: Original image 



Figure 2: Boundary representation. 

with the i-th horizontal line (see figure 2). The flame 
boundary will be described by the vector 

X = [Xi,X2, ... (3) 

containing the abscissas of all intersection points. 

3. IMAGE MODEL 

The observed image consists of two basic components: 
a structured background and the flame (see figure 1). 
The flame region is characterized by high intensity val¬ 
ues which saturate the camera. This may not be true 
near the flame boundary where image values depend 
on the flame and background but it is an accurate as¬ 
sumption inside the flame region. 

Figure 3 shows the intensity profile of a row of image 
1. The evolution of intensity in the interval [0,125] 



Figure 3: Intensity values of one line from the observed 
image. 

follows the background. After column 125, intensity 
grows until it saturates due to the presence of the flame. 
Transition from l^ackground to the flame region occurs 
at Xi = 125, approximately. 

We shall assume that the observed image I is the 
superposition of a deterministic image /(a;) with a noise 
image W with Gaussian distribution. Let li(xi) and 
Wi denote the i-th line of these images. Therefore 

Ii=Ii{x) + Wi (4) 

where /(x) is equal to the background image B, inside 
the background region, and equal to a constant value 
S in the flame region, i.e., 

Two possible background image models are obtained 
by low-pass filtering the images of figure 4. These im¬ 
ages were created by removing the main flame in figure 

1 . 

Figure 5 displays a row of I{x) assuming that Xi = 
125 (solid line) and the background profile (dashed line) 
obtained from figure 4(a). 

Assuming that Wi A^(0,7?r), it can ])e concluded 
from (4) and (5) that 

]}{Ii\Xi) a exp 

( 6 ) 

Furthermore, it will be assumed that image lines are 
independent and the components of Wi are uncorre¬ 
lated random variables with variance Simplifying 
(6) and taking logarithm of both members, one obtains 
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(b) 

Figure 4: Background Images. 



Figure 5: Image line model with Xi = 125 (solid line) 
and background model (dashed line). 


150 2(K) zr.o 

Figure 6: Lorentzian function with Xi = 125 and /J = 5. 

log{K/iN)} = (7) 

where ||.|| denotes the Euclidean norm. 

4. CONTOUR MODEL 

The flame boundary is modelled in this paper a,s a uni- 
dimensional Markov random field with Gibl)S i)robabil- 
ity density function [5] 



where Z = is the partition func¬ 

tion and Vc{x) is the potential of clicpie C, (clique is 
an isolated site or a set of sites such that any two sites 
in C are neighbors). Clique potentials are defliuHl l)y 
the user or estimated from a large set of data (field re¬ 
alizations). In this paper, the first strategy is adopted. 

Since Gibbs distribution (8) is used as a. prior in 
the estimation of the flame boundary, it should contain 
the a priori knowledge available al)Dut the unknown 
parameters. The flame boundary is a smooth curve 
with a known average shape (see figure 7). We shall 
use zero order and second order cliques to model this 
information. 

Zero order cliques are used to measure the distance 
of the estimated flame boundary with respect to the 
average shape. The potential of a zero order clique is 
defined by the Lorentzian function [3] 
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5. MAP ESTIMATOR 



Figure 7; Mean shape. 


Pc»(a:i) = log|l + ^^iy^) I (9) 

as shown in figure 6, where x is the mean boundary 
defined by the user or computed from a large set of 
data. 

The probability density function associated with a 
zero order cliques 



has longer tails than the ubiquitous normal distribu¬ 
tion. Therefore, outliers (boundary values far from the 
average) have smaller influence on the estimates, lead¬ 
ing to robust estimation procedures. 

To increase the smoothness of the estimated con¬ 
tours, second order cliques { 2 -l,i,z + l} are used. 
The clique potentials are define by 


Vcf{xi-i^Xi^Xi^i) — 2xi + Xi^i) ( 11 ) 

These are regularization terms similar to the ones used 
in ill-conditioned computer vision problems [6] or in 
active contours [7], [1]. 

Using equations (8), (9) and (11) one obtains the 
prior 


p{x) oc JJ 


exp { —— 2.x~i + rci+i)^} 


i+l 


1 + 


1 { V 

P ) 


( 12 ) 


Let us now discuss the computation of the MAP shape 
estimate. Replacing (2) and (8) in (1), 


xmap = argmax[2j(/|.^)p(.'!;)l 

X 

= arg inin | - log [p(/|:i-)] + ^ Vc(x) 

(13) 

The evaluation of MAP estimates requires the opti¬ 
mization of a non-convex function with a large num¬ 
ber of variables. Several methods have Ircen proposed 
to tackle this problem (e.g. Metropolis algorithm [8], 
Gibbs sampler [5] or iterated conditional modes [2]). 
We have used the iterated conditional modes (ICM) 
algorithm proposed by Besag in [2]. ICM is a deter¬ 
ministic relaxation algorithm which performs a mini¬ 
mization with respect to a single variable in each itera¬ 
tion. Let {ii, fa, • • •, U, ■ ■ •} fi sociucuce of sites (eacli 
site must occur an infinite number of times). At t-th 
iteration Xi, is modified to minimize the cost function 
keeping the other variables constant. This leads to a 
recursive update law 


Xi = arg 




-f log < l-f 


1 I Xi Xi 


+ 


2A2 


^ {-Xi-2 + d.x’i-i -f 4£-i+i - .i;,:+2) 


(14) 

The algorithm is initialized using the maximum like¬ 
lihood estimate of the flame boundary. 


6. EXPERIMENTAL RESULTS 

Figure 8(a) shows the boundary estimates obtained us¬ 
ing background image of figure 4(a) and mild a priori 
shape information. In this case, acceptable estimates 
of the flame boundary are obtained in the upper part 
and middle part of the image. However, the algorithm 
is not able to separate the main flame from smaller 
ones which are observed in the lower part ol the image. 
This problem can be overcome by using strong shape 
restrictions (small /?). Unfortunately, this reduces the 
ability to correctly estimate shape deformations. One 
way to alleviate this difficulty consists of including the 
other observed flames in the background image (see fig¬ 
ure 4(b)). Since these regions are saturated, only shape 
prior information will be relevant to estimate the main 
flame boundary there. The results obtained with this 
background model are shown in figure 8(b). 
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(b) 


Figure 8: Contour estimates using background images 
of figure 4. 

7. CONCLUSIONS 

This paper describes an algorithm for the segmentation 
of industrial flames in a cluttered background. The pro¬ 
posed method is a modified version of the algorithm 
developed by Figueiredo and Leitao in [4] to encom¬ 
pass the use of robust estimation techniques which are 
instrumental in industrial applications. A Bayesian 
approach is adopted to estimate the flame boundary. 
The boundary is obtained by a MAP estimator using 
an image model and a shape prior. The image model 
takes into account the information available about the 
structure of the background. This information is ob¬ 
tained from an image of the background. To model the 
shape prior, an unidimensional Markov random field 
is used. Lorentzian functions are used to define the 
clique potentials in order to improve the robustness of 


the estimates in the presence of shape outliers. This is 
considered as a key feature in the performance of the 
algorithm with real images. 

Flame images obtained inside a boiler of Setubal 
thermo-electric plant were used to illustrate the algo¬ 
rithm performance. 
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ABSTRACT 

Multiuser detectors for CDMA systems have been an 
active area of research for last two decades. Most of 
these detectors are developed under the assumption of 
having no interference external to the system. This pa¬ 
per presents a multisensor-multiuser detector for CDMA 
systems able to estimate the spatial signature of all ac¬ 
tive users projected onto the subspace orthogonal to the 
external interference. With this information an specific 
beamformer can be designed for each user able to null 
both the external and multiple access interference. 

1. INTRODUCTION 

Direct-Sequence Code Division Multiple Access (DS- 
CDMA) is an accepted technique for future high ca¬ 
pacity digital wireless communications systems. Nev¬ 
ertheless, despite of the number of desirable features, 
CDMA systems are interference limited and suffer from 
near-far problem. The near-far problem occurs when 
different users have dissimilar powers and their code 
sequences are not perfectly orthogonal. In an asyn¬ 
chronous DS-CDMA system it is impossible guarantee 
that the users’ received signals are orthogonal for ev¬ 
ery possible realization of the propagation delays. An¬ 
other important limitation for the performance of wire¬ 
less CDMA systems is multipath fading, which induces 
more dissimilar powers. Furthermore, when there is a 
single fading path, there is no means of use the inherent 
temporal diversity of CDMA to overcome fading. 

The standard receiver for DS-CDMA is simply a 
bank of matched filters, each fiilter matched to a partic¬ 
ular user code. The standard receiver works fairly well 
in a system where we have only a few users whose codes 
are almost orthogonal and where the received powers 
are equal. However, in a near-far situation the stan¬ 
dard receiver fails. Since multiple-access-interference 
is a highly structured interference and the signatme 
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waveforms of all users are available at the central re¬ 
ceiver, this additional knowledge may be exploited in 
the decision process. Multiuser detection has the ca¬ 
pability of eliminating near-far problem and providing 
a capacity increase in CDMA systems [1][2]. On the 
other hand, since spatial diversity reception combats 
the fading effects of the channel, some multiuser re¬ 
ceivers incorporating explicit antenna diversity to over¬ 
come fading have been also contemplated in the liter¬ 
ature [3] [4]. The use of an antenna array has been 
also considered to cancel those users with higher power, 
avoiding the requirement of perfect power control [4] [5]. 

Nevertheless, multiuser receivers developed so far 
do not consider the possibility of having interferences 
external to the system. We must take into accoimt that 
most existing users in any given frequency band are 
narrowband. A certain level of out-of-band spurious 
emission is unavoidable and, in fact, is legally permit¬ 
ted. In the limit when the interference gets very close 
to a base site, it can significantly degrade the capacity 
of the entire ceU. Such jamming from existing services 
to new mobile services should be considered in the de¬ 
sign of a high performance receiver. Another kind of 
external interference is that one provocated by other¬ 
cell user, about which the centralized receiver has no 
information. 

In this paper we present a multisensor-multiuser 
scheme able to overcome near-far problem, external in¬ 
terference and multipath fading. The basic steps of the 
algorithm are the following. First, the received signal 
is fed to a bank of matched filters. Using the outputs 
of the filters matched to the active users plus the out¬ 
put of one filter matched to an imused code, we first 
estimate the interference subspace. This information is 
then used jointly with the signal at the filters output to 
estimate the spatial signature of every user projected 
onto the subspace orthogonal to the external interfer¬ 
ence. 

The rest of the paper is organized as follows: Next 
section describes the signal model. In section 3 the pro¬ 
posed method is formulated. Section 4 presents some 
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simulation results and finally in section 5 general con¬ 
clusions are drawn. 

2. PROBLEM FORMULATION 

The system under consideration is a K-user asynchronous 
DS-CDMA system using BPSK modulation and oper¬ 
ating over a frequency non-selective channel. The base¬ 
band signal for the k-th user is given by 

Skit) =Y^dk H bk{t - mT) (1) 

m 

the data stream dk [tti] € is pulse amplitude 

modulated by a period of the code waveform bk(t), with 
bk{t) = 0 for t ^ {0, T} and T the bit time. 

L-l 

/=o 

Cki is the Z-th chip in the A;-th code and Ptc is a rect¬ 
angular pulse of duration Tc = T/L. 

The received baseband signal for the multichannel 

case is 

x(t) = ^ y/^Sk{t - rjafe(t) + Aji{t) + n{t) (3) 
k=i 

Pky Tk and afc are respectively the transmitted power, 
the propagation delay and the steering vector with di¬ 
mension equal the number of sensors N . All for the 
A:-th user. Matrix A/ contains the steering vectors of 
the interferences and vector i(t) contains the interfering 
signals at time t: 

A/ = ••• 

= (5) 

with I the number of directional external interferences. 
No assumption about the temporal structure of the in¬ 
terferences is made. 

n(i) is the noise vector at the array input. The 
noise is considered white Gaussian, imcorrelated among 
different sensors and with equal variance (t^ for all 
of them. As we are assuming frequency non-selective 
channels, afc(t) may be considered as the sum of P co¬ 
herent paths [ 6 ]. Then, we call afc(t) the generalized 
steering vector or spatial signature of the fe-th signal. 
This vector may be time-varying due to the combined 
eflfect of multipath and Doppler. Here, a^ is assumed 
to be slowly varying compared to the symbol time: 

afc(t) = afe(t + T) ( 6 ) 

The model [ 6 J and the proposed method are easily 
generalizable to frequency selective channels. 


3, MULTIUSER SEPARATION AND 
INTERFERENCE SUPPRESSION 

The received signal vector is fed to a bank of AT + 1 
filters. Each one of the first K filters is matched to 
one of the K active users of the system. The last one 
is matched to an imused code. The sampled output of 
the Z-th filter is: 

1 ^(n+l)T+r/ 

Zi [n] = — / x(t) 6 i(i - nT - ri)dt 

-Z JnT-\-ri 

l = l...K + l (7) 


Let be 


Cki H 


1 

■= / Skit - Tk)biit - nT - Ti)dt 

■Z J nT-fr/ 


= /3kidk[n]+jkidk[n + sgniTik)] ( 8 ) 

Tik is Ti-Tk, sgnir) denotes the sign fxmction equals 
±1 depending on the sign of r or 0 if r= 0 , and 


Pki = ^Rbkbiink) (9) 

7 fci = ink - Tsgnink)) (10) 

P^khi ('^) denotes cross correlation function between the 
k-th and Z-th signature signals at r offset, that is 

Rb,bM= [ bkit + T)biit)dt (11) 

Jo 

Note that Pik = (3ki and 7 /fc = jkh For the A:-th 
user Pkk = 1 and 7 ^^ = 0 . 

Finally, eq. 7 may be written as 

K 

z/ N N afe-f A/h [n] + n/ [n] ( 12 ) 

k=i 

where and n; are respectively the interference and 
noise vector filtered by the Z-th filter. 

Consider now the spatial correlation matrix 

Rzz.i = E {zi[n]zf [n]} 

K K 

VPkPrE {cki [n] ch [n]} afcaf+ 

fc=lr=l 

+AjE [n] it [n]} Af + E {n, [n] n, [n]} (13) 


1 
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E {•} and H denote expectation and complex conjugate 
transpose, respectively. 

Let’s calculate different terms. Assuming that sym¬ 
bols are xmcorrelated: 

E {cfci [n] cj, [n]} = {Ph + 7 * 1 ) (14) 

where 6kr is the Kronecker delta. 

Regarding the external interference, let be: 


Q, = E{i/[n]i,"[n]} 

= / / Si(u - v)bi{u)bi{v)dudv (15) 

J 0 J 0 

with 

S<(r) = £ {i (i + r) (t)} (16) 

Working with expression 15, Qj can be derived as: 

= ^ (IT) 

The spatial correlation of filtered noise is: 

2 

E {n/ [n] n/^ [n]} = o^In — *^1^ (^®) 

Iff is the identity matrix with dimension NxN. 

Finally, we can write the spatial correlation matrix 
at the output of the first K filters as: 

K 

^zz,l = 

k^l 

+A/Q/A^ “f J = l, (19) 

and the spatial correlation matrix at the output of the 
K + 1 filter as: 

^zz,K-\-l = 53 (Pk(K-\^l) + 7fc{i<'+l)) + 

fc=l 

4-A/Qi<'+i A^ + ojc^ilN (20) 


Let’s define also matrices Rzz, R/, A, and N as 
follows: 



R**,i 


■ A/QiAf^ 

11 

R«z,jc+i 

piaiaf 

Rj = 

. A/Qk+iA^ 

2 

A = 

PKSiKB-K 


rl 0 In ( 22 ) 

/ 


L 0 J 

where the symbol 0 denotes the Kronecker product, 0 
is an all zeros matrix with dimension NxN and 1 is 
an all ones column vector with dimension K + 1- With 
previous definitions and some algebraic manipulation 
of eq. 19 and 20 we can write: 

2 

R„ = (S(8)Iiv)A-l-R/-l-(2^) 

Operating upon matrix we obtain a new matrix 
M. This new matrix can be partitioned into + 1 
blocks with dimension NxN: 

M = (S”^ (8) ] (24) 

The part of M corresponding to the fc-th user is 

2 

Mfc = Pk^k^k + A/QfcAf -1- ttfc—Ijv (25) 

Uk is the Jk-th element of vector u = S“^l and Qt is 
calculated as the fc-th part of matrix: 



Qi 


■ Qi ■ 

Q = 

j 

= (S-i(8)I/) 

• 


. Qn+i . 


. Qjf+i 


1/ is the identity matrix with dimension Ixl. Remem¬ 
ber that I is the number of externd interferences. 

The last part of M, corresponding to the K + \ 
filter, is: 


Let be matrix S, the matrix whose element at the 
fe-th row and 1-th column is Ph+lll- Thus, dimension 
of Sis (ii!:-i-i)®(ii[:-n) 


1 


s = 




^l(if+l)+'Tl(K+l) 

1 

( 21 ) 


M/f+i = A/Qjf+iA/^ -1-ti/f+i—I n (27) 

Uif+i is the K + 1-th element of vector u. The signal 
subspace component in matrix M^+i contains only in¬ 
formation about the external interferences. The other 
submatrices have information about the external inter¬ 
ferences and the signal of the corresponding user. The 
main eigenvector in this case, depending on the signals 
power, may be quite different of the steering vector of 
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any of the involved signals. Nevertheless, it is possible 
to eliminate the information relative to the external 
interferences in each Mk with A; iiT +1. For this pin- 
pose, from the signal subspace of we compute 

the projection matrix onto the subspace orthogonal to 
the external interferences. The projection matrix is: 

Px = Ert,K+iE^/<'+i =^N - 

=: lisr — A/(A/Af(28) 

where En.ir+i is the noise subspace of matrix M/c+i- 
Its columns are the noise subspace eigenvectors. 
is the signal subspace of matrix Its columns are 

the signal subspace eigenvectors. Note that it is not 
necessary to calculate the steering vectors of the in¬ 
terferences individually (matrix A/). We can compute 
Px from the global signal subspace or global noise sub¬ 
space. 

Next step is projecting each one of the submatri¬ 
ces Mjfe. As resulting of the projection operation, each 
matrix P±Mic {k = has only one signal eigen¬ 

vector: ap^fc. This eigenvector is the steering vector 
of the corresponding user projected onto the subspace 
orthogonal to the external interference. With this in¬ 
formation an specific beamformer for each user may be 
computed able to null external and multiple access in¬ 
terference. The weight vector for the fc-th user is the 
A;-th column of matrix W: 

W = Ap{A^Ap)-^1k (29) 

with Ap = [ap,i... ap,fc... &p,k\ I/f the identity 
matrix with dimension KxK, 

4. SIMULATION RESULTS 

An as3mchronous CDMA system was simulated in order 
to investigate the performance of the multiuser detec¬ 
tion algorithm presented in this paper. The modulating 
signals in this system are Gold sequences with length 
L=31. 

The characteristics of an external interference de¬ 
pend to a large extent on its origin. It may be cate¬ 
gorized as being either broadband or narrowband rel¬ 
ative to the bandwidth of the information-bearing sig¬ 
nal. The proposed detector permits capacity increase 
exploiting the different spatial signature of users even 
in presence of external interferences from which the re¬ 
ceiver has no information. As an example we illustrate 
the separation of two users (Ar=2) that are received by 
an array of six A/2 linearly spaced sensors (iV=6). We 
consider a jamming signal consisting of one sinusoid of 


frequency equal the carrier frequency of the CDMA sig¬ 
nals. The angles of arrival are 35° and 45° from broad¬ 
side for the systems users and 70° for the jamming 
signal. The signal to noise ratio {SNR) at each of the 
sensors is 10, 16 and 2idB respectively. The relative 
propagation delays for the system users are ri=0 and 
r2=2Tc. At the receiver three matched filters are used. 
The first and second filter are matched to the first and 
second user code, synchronized with the corresponding 
one. The third one does not need to be matched to any 
system user. In the simulation we have assumed that 
is 5Tc delayed with respect to the first user. The corre¬ 
lation matrix at the output of the matched filters bank 
is estimated by temporal averaging of the despreaded 
signal vector, using a block size of 100 symbols. The 
beamformer designed for the system users under these 
conditions are shown in figure 1. 

Baamfonner daalgnad for aach user 



Fig. 1: Beamformer for active system users 

If an exact knowledge of the correlation matrix at 
the output of the matched filters would be available, 
estimation of the steering vector would be perfect for 
every user, independently of the signal powers. The 
limitations in the estimation are not due to the method 
itself but to an inexact knowledge of the correlation 
matrix. The requirement of an accurate estimation of 
matrices Hzzik hmits the possible dynamic range of the 
powers of the received signals, since a weak signal may 
be masked by a strong one. Nevertheless, the powers 
range where the proposed receiver offers good perfor¬ 
mance is excellent. This situation is illustrated in figure 
2, which plots the SNIR (signal to noise plus interfer¬ 
ence ratio) at the array output (solid line) versus the 
power of the narrowband interference, for the first (-0-) 
and the second user (-+-). The SNIR of each user at 
the output of the classical receiver (a single matched 
filter) is also illustrated with dashed line. 
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Narrowband Inlarlarenca 



Fig. 2: Output SNIR with narrowband jamming 

In previous example we considered a narrowband 
interference. Nevertheless, no knowledge about jam¬ 
ming was used by the receiver. Then, the same proce¬ 
dure may be applied to another kind of interference. As 
a second example we consider a broadband one. This 
is characterized as a BPSK signal with random chips. 
This situation is worse for the classical receiver, since 
the code gain is now reduced with respect to previous 
case. On the contrary, performance of the proposed de¬ 
tector is practically the same than in the narrowband 
case. 


Broadband Interforonca 



Fig. 3: Output SNIR with broadband jamming 

The code gain with respect multiple access interfer¬ 
ence in both cases is smaller than that achieved in a 
synchronous system. Specifically, for our two users the 
gain with respect to each other is reduced from 30dB 
to lldB for the time delays considered. As can be de¬ 
duced from fig. 2 and 3, in such a situation the use 
of an antenna array may be very useful. The rejection 
of the multiple access interference yields a substantial 
improvement of the SNIR with respect to the clas¬ 
sical receiver. This improvement is achieved even in 


the presence of external interference with high power. 
Since the external interference is also cancelled, the im¬ 
provement is more significant when the jamming signal 
has higher power. 

5. CONCLUSIONS 

A new method for steering vector estimation in CDMA 
systems in the presence of interference external to the 
system has been proposed. The most important fea¬ 
tures of this method are related below. 

As can be observed from the simulations, the pro¬ 
posed method is almost independent on the nature of 
the jamming signal and can cope with a very high dy¬ 
namic range of interference power. The estimations are 
carried out working at the symbol rate instead of the 
chip rate, without any temporal reference or any a pri¬ 
ori spatial information. The required information is 
the codes waveform and timing of active users. This 
information is always required in order to demodulate 
the signals. As no model is assumed for the steering 
vectors the proposed solution is robust to calibration 
errors. In the simulations here presented we have con¬ 
sidered a non-multipath scenario because is easier to 
extract conclusions from the array beam pattern hav¬ 
ing only one direction of arrival. Nevertheless, since 
there is no model assumption for the steering vector, 
the same procedure can be applied to estimate the gen¬ 
eralized steering vector when the signals arrive from 
multiple reflections, while other classical DOA estima¬ 
tion methods as MUSIC fails in this situation. 
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ABSTRACT 

This paper presents a technique for adaptive beam- 
forming that exploits the cyclostationary properties of 
CPFSK modulations. The method is based on the abil¬ 
ity of this type of modulations to generate spectral lines 
when they are raised to a fractional number which is 
the inverse of its modulation index. A stochastic gra¬ 
dient algorithm is proposed to compute the coefficients 
that maximize the output SINK. The algorithm is blind 
because it does not need to know the transmitted sym¬ 
bols: only the carrier frequency, the symbol rate and 
the modulation index is required. 

1. INTRODUCTION 

It is well known that many digital modulated signals 
generate spectral lines when they pass through certain 
nonlinear transformations. As an example, linear dig¬ 
ital modulations like ASK, PSK, QAM, etc., produce 
spectral lines at frequencies related with the carrier fre¬ 
quency and the symbol period when they are raised to 
a integer power (usually 2 or 4). This property has 
been sucessfully used in [1] to develop a blind adaptive 
beamforming technique that only requires to know one 
of the frequencies of the spectral lines generated by the 
desired signal. This technique consists on adjusting the 
beamformer coefficients to minimize the Mean Square 
Error between the array ouput after the nonlinearity 
and a complex exponential. The advantages of this 
technique are remarkable: it is not necessary to know 
the desired signal steering vector (therefore it is very ro¬ 
bust to array calibration errors) [2], it does not require 
a reference signal [2], it does not suffer from capture 
problems as the Constant Modulus (CM) beamformer 
[3] and it can be easily implemented with a stochastic 
gradient algorithm without solving generalized eigen¬ 
values problems [4]. 

This work has been supported by CICYT, Spain, grant # 
TIC96-0500-C10-02 


The beamforming technique proposed in [1] only 
considers integer powers and therefore its applicability 
is limited to linear modulations. The main objective of 
this paper is to explain how to extend this approach to 
a CPFSK (Continous Phase Frequency Shift Keying) 
[5] modulation which belongs to the class of nonlinear 
memory modulations. CPFSK signals do not generate 
spectral lines when they are raised to an integer num¬ 
ber but it will be demonstrated that they do if they are 
raised to a fractional number which is the inverse of its 
modulation index. Therefore, beamformer coefficients 
can also be selected to minimize the Mean Square Error 
between the array output after the nonlinearity and a 
complex exponential. The differences with the integer 
case is that implementation of the adaptive algorithm 
requires a phase unwrapping algorithm and analysis of 
stationary points becomes extremly difficult because 
fractionally order moments appear. Nevertheless, sim¬ 
ulations show that performance is very similar to the 
integer case. 

This paper is organized in five sections. Section 
2 presents the spectral line generation property of 
CPFSK signals. This property is used in section 3 to 
develope an optimization criterion for blind adaptive 
beamforming. In section 4 simulation results are pre¬ 
sented. Finally, section 5 is devoted to the conclusions. 

2. SPECTRAL LINE GENERATION 

A zero-mean complex signal x{t) generates a s|)ectral 
line at a fro(|uoncy a after passing through the nonlin¬ 
earity (.)^ if and only if the r-ih order cyclic moment 
defined as 

1 

= T ( 1 ) 

exists and is nonzero [6]. The operator (•) denotes 
the time average operation and r is a real number not 
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necessarily integer. 

The existence of these cyclic moment arises from the 
cyclostationarity properties if x(<). More specifically, it 
can be demonstrated [6] that, under certain conditions, 
the cyclic moments of x{t) correspond to the Fourier 
series coefficients of its statistical moments 

mrx{^) — W] ^ ( x^{i)f{xyi) dx (2) 

J — CO 

where f{x,i) is the first order density function of x{t) 
and £'[■] denotes the statistical average operation. 

In the sequel we calculate the statistical moment of 
order r = 1/h for a CPFSK signal with modulation 
index h and its Fourier series coefficients. A CPFSK 
signal is represented as 

x{t) = Acos(2wfct + 2irh f d{T)dT + <j>o) (3) 

where A is the signal amplitude, fc is the carrier 
frequency, h is the modulation index, (j>o is the initial 
phase of the carrier and d(t) is a PAM signal given by 

d{t) = J2lk9{t~kT) (4) 

k 

where g{i) is a rectangular pulse of amplitude 1/2T' 
and the symbols Ik nre the amplitudes which result 
of mapping digits of the information sequence to the 
amplitude levels {±l,i:3,In the interval 
nT <i<{n A l)T the complex representation of x{t) 
can be written as [5] 

s(t) = Aexpfj Uitfci + + <f>o'^ | 

^ (5) 

where 

<{>n = '^h, (^) 

k=0 

Using this expression, the ^th order moment of s(t) 
is given by 

Assuming that the symbols /„ are independent and 
identically distributed, 4>n is independent from In and 
therefore 

Now recall that the term (j>n is the sum of n odd 
numbers h- Therefore, d>n is an even number when n is 
even and it is odd when n is odd and as a consequence 


This fact enables us to express the first 
average in (8) as follows 

= Yje^'’^''Prob{<i>n) = 

<Pn 

= (9) 

4>n 


Let us calculate the second average in (8). The 
symbols /„ take values inside a set of equiprobable 
points, i.e., Prob{In = M — I — 2k) = 1/M, k = 
0,..., M-1. Therefore, this average takes the following 
form 


E[e 




. M-1 

k=0 

M-1 


M 

g-jVn 


M 


*=0 


Finally, substituting (9) and (10) into (7) we obtain the 
Ith order moment of s{t) 

^ M-1 

^ k=0 

.( 11 ) 

It is apparent that this moment is a periodic 
function of time since it has the form of a Fourier series 
expansion. As already mentioned, the coefficients of 
this expansion are the cyclic moments. This implies 
that a CPFSK signal has nonzero ^th order cyclic 
moments at the following frequencies 


L M-l-2i 
f. — il J - 

^'~h 2T 


f = 0,---,M-l (12) 


and their value is 

m'\=^‘=A^e^^ i = 0,---,M-l (13) 

This result demonstrates that after passing the 
signal s(t) through the nonlinearity (•)^ it is obtained 
M spectral lines at the frequencies fi given in (12) with 
amplitude ^. To illustrate this property, figure 

1 shows the power spectral density (PSD) of a four-level 
CPFSK signal with modulation index h = 0.75 before 
and after the nonlinearity (O^. It can be seen that four 
spectral lines appear after applying the nonlinearity. 

In the following section it is shown how this prop¬ 
erty can be used to optimally extract a signal from the 
array output minimizing the noise and interferences ef¬ 
fect. 
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Figure 1: PSD of CPFSK with h = 0.75 before and 
after the nonlinearity 

3. OPTIMIZATION CRITERION 

Let us consider a narrowband beamformer which pro¬ 
cesses an input vector x(i) to produce an output that 
can be expressed as 

y(<) = w"x(<) (14) 

where w is a complex-valued coefficients vector and ^ 
denotes the conjugate transpose operator. 

We propose to adjust the coefficients w according 
to the minimization of the following cost function 

where < • > denotes the time average operator and a 
is one of the frequencies /» where the desired signal 
generates a spectral line after the nonlinearity (,)7r. 
Similarly to [1] we claim that minimization of this 
cost function yields to optimal extraction of the desired 
signal in the sense that the Signal to Interference and 
Noise Ratio (SINR) is maximized. 

A simple and reasonable way to compute the opti¬ 
mum coefficients w is the steepest descent method 

w(n -b 1) = w(n) - /zVw‘/(«) (16) 


where /i is the algorithm step size and Vw*/ represents 
the complex gradient of J with respect to w in the 
instant n. In our particular case Vw»/ is 

Vw</ = < e*> (17) 

where e(n) = _ y^{n) is the error signal 

whose variance we want to minimize and * denotes 
the conjugate operator. Substituing the time average 
in (17) by its instantaneuos estimate we obtain the 
following stochastic gradient algorithm 

w(n + 1) = w(n) + iie*{n)y^-\n)x{n) (18) 

It is interesting to note that implementation of tliis 
algorithm only requires the modulation index h and the 
frequency a. Since from (12) the value of a depends on 
h, fc and T, only these three parameters are required 
to extract the desired signal. The algorithm is blind 
because the knowledge of the transmitted symbols is 
not required. 

Implementation of (18) typically requires raising 
a complex number to a fractional exponent. This 
operation is carried out as follows 

z = = /e-' (19) 

where r is any real number. It should be mentioned 
that when computing arg{z) we cannot use a conven¬ 
tional arctangent subroutine. This type of subrou¬ 
tines give us the principal value of a complex num¬ 
ber, ARG{z)y which is in the interval [—tt, tt] interval. 
The relationship between the true value of the phase, 
arg(z)^ and its principal value is 

arg{z) = ARG{z) -b 27rk (20) 

where k is an integer. However note that for an 
arbitrary real number r 

gi arg{z) r ^ ARG{z) r ^j2irk r (21) 

Therefore, the principal value of the phase cannot be 
used. To compute (irg{z) it is necessary an unwrapping 
pahse algorithm such as the one described in [7]. 

The cost function that we are minimizing is not 
a quadratic form of w. This raises the question of 
wether there are undesirable stationary points that 
may impair the convergence of the adaptive algorithm. 
The analysis in [1] shows that for h = 0.5 the cost 
function (15) is free of undesirable minima except when 
the interferences generate a spectral line to the same 
frequency that the desired signal. This particular case 
of CPFSK is known as Minimum-Shift Keying (MSK) 
[5] and it is very common in communications due to 
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its greater bandwidth efficience. Analysis for other 
modulation indices involves fractionally order statistics 
and turns out to be much more complicated and it 
has not been performed. However, simulations did not 
show the existence of undesirable minima. 

4. SIMULATIONS 

Several computer simulations were carried out to illus¬ 
trate the performance of the proposed method. We 
considered a uniform linear array with 10 sensors eq- 
uispaced half wavelength. The array input signals are 
sampled at a rate ten times faster than the symbol rate. 

In the first simulation example, a simple environ¬ 
ment with one binary CPFSK signal and Gaussian 
noise was considered. Its input SNR is 5 dB and its 
direction of arrival (DOA) 0^ Figure 2 plots the time 
evolution of the SINK for different values of the mod¬ 
ulation index. In all cases the algorithm step-size is 
set to 2 X IQ-^. It can be seen that in all cases it con¬ 
verges to the maximum SNR solution. However, rate of 
convergence strongly depends on the value of h. Simu¬ 
lations showed that convergence is faster for values of 
h close to 0.5 and 1. 

In the second simulation example we considered an 
environment with three incoming binary CPFSK whose 
parameters are reflected in table 1. The three have the 
same modulation index {h = 0.6) but different carrier 
frequencies. Figure 4 shows the time evolution of the 
SINR in this environment with an algorithm step-size 
/i = 5 X 10~^. It can be seen, again, that the algorithm 
converges to the maximum SINR solution. 


Signal 

Carrier Freq. 

SNR 

DOA 

Desired 

0.0 

0 dB 

0® 

Interf # 1 

0.1 

10 dB 

30* 

Interf # 2 

0.2 

20 dB 

o 

O 

CO 

1 


Table 1: Environment parameters 

To illustrate the ability of the algorithm to cancel 
interferences that generate spectral lines at the same 
frequency as the desired signal, a third computer 
experiment wais carried out in which we considered the 
same environment as before but with the interferences 
having the same carrier frequency eis the desired signal. 
Figure 5 shows the time evolution of the SINR for this 
case. It can be seen that the misadjustment noise of 
the algorithm has increased but still converges to the 
maximum SINR solution. 


h-0.3S 





Figure 2: Time evolution of SNR for different values of 
mdulation index. 
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5. CONCLUSIONS 



Figure 3: Time evolution of SNR for different values of 
mdulation index (cont.). 



Figure 4: Algorithm performance in an interference 
environment (different carrier frequences). 



Figure 5: Algorithm performance in an interference 
environment (same carrier frequences). 


This paper presents a technique for adaptive beam¬ 
forming that exploits the cyclostationary property of 
CPFSK signals. The method uses the ability of this 
kind of modulated signals to generate spectral lines 
when they are raised to a fractional number which 
is the inverse of its modulation index. The approach 
is based on minimizing a cost function defined as the 
Mean Square Error between the array output raised 
to the inverse of the modulation index and a complex 
exponential. The frequency of this exponential is one 
of the frequencies of the spectral lines generated by 
the desired signal. The approach is blind because the 
transmitted symbols are not required: only the carrier 
frequency, the symbol rate and the modulation index 
is needed to extract the desired signal. 

The analysis of the stationary points in tlie pro¬ 
posed cost function is very complicated because it im¬ 
plies dealing with, fractional order statistics and it has 
not been performed. However, simulations have shown 
that in the fractional case the behavior is similar to the 
integer case. 
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ABSTRACT^ 

The Constant Modulus Array has a slow rate of 
convergence mainly due to both the nonconvex nature 
of its cost function and the well known behavior of 
stochastic steepest descent algorithms for environments 
with a large eigenvalue spread. In this paper we analyze 
the solutions of the Constant Modulus Cost Functions, 
showing that the weight vectors associated to its 
minima lie on the signal subspace. From that 
information we develop a modified version of the 
Constant Modulus Array, which speeds up the 
convergence and reduces the final misadjustment error. 
The proposed method is specially useful for arrays 
having a large number of sensors and low Signal to 
Noise Ratio for the Source of Interest. 

1. INTRODUCTION 

The Constant Modulus Array (CMA) was introduced 
in 1986 by Gooch and Lundell [1], who suggest to 
apply the Constant Modulus (CM) criterion originally 
designed by Godard [2] to the field of adaptive antenna. 
Due to its interesting properties (low computational 
load, independence of the array manifold, etc.) it has 
become probably the most popular blind beamforming 
scheme. However, it is far from being free . of 
drawbacks. One of its main inconvenient is its slow 
convergence, which sometimes makes it inapplicable in 
practical environments, specially when the Signal to 
Noise Ratio of the incoming signals is poor. This is the 
problem we address in our paper. As we will show, the 


* This work has been partially supported by the National 
Research Plan of Spain CICYT, under Grant TIC-95-1022- 
C05-01 


analysis of the extrema of the cost function will reveal 
usefol information about the location of the minima and 
its relationship with the signal subspace. This fact can 
be exploited to speed up the convergence of the 
algorithm. 

The paper is organized as follows: in section 2 we 
review the constant modulus cost functions, introducing 
the appropriated vectorial notation; section 3 is devoted 
to analyze the nature of the solutions and its 
consequences. In section 4 we propose a novel 
technique exploiting the results of the performed 
analysis and having in mind to avoid some of the 
undesirable properties of the original proposal. Section 
5 shows the relationship between the proposed 
technique and the Generalized SideLobe Canceller 
(GSLC). Simulation results of the suggested algorithm 
compared with other approximations are shown in 
section 6. Finally we end the paper by paying attention 
to specific scenarios where the proposed algorithm can 
be specially useful. 

2. THE CONSTANT MODULUS ARRAY 

The CMA is one of the simplest blind beamforming 
scheme. Although, as happens with Sato and Decision 
Directed (DD) algorithms, it can be seen as a particular 
case of the more general family of Bussgang 
algorithms, it was developed independently by Godard 
and later applied to array processing by Gooch and 
Lundell. The algorithm tries to minimize a nonconvex 
cost function designed to penalize deviations in the 
envelope of the array output signal; 

' J=E|(|y[nf-l) I (1) 

being y[n] the output of the array, obtained as a linear 
combination of the received signals. 




101 



y[n]=w“x[n] (2) 

where x[n] is the received snapshot and w is the weight 
vector, which describes the spatial response of the 
array. Superindex ” hold for the hermitian operator, i.e.: 
transpose and conjugate. 

The underlying idea under the mathematical 
formulation is very simple: the Source Of Interest (SOI) 
is assumed to have the constant envelope property'. 
This property is lost due to the contribution of noise 
and/or interfering processes to the output signal y[n]. 
The algorithm tries to indirectly remove noise and 
interferences by restoring the loss property. 


The optimization procedure follows a single, LMS 
like, stochastic steepest descent algorithm which yields 
a weight vector adaptation equation given by: 


w[n +1] = w[n]- ju(|y[n]^ -1 ]y*[n]x[n] (3) 


where )i. is the step-size parameter, whose choice is a 
commitment between convergence speed and 
misadjustement noise. Probably the more useful result 
about the optimum value of JJ. is given by Katia and 
Duhamel[3], who suggest to select it as: 

^ |y[n](|y[n]+l|) 

where |io is the normalized step-size parameter, which 
must lie in the open (0,1]. The resulting algorithm is 
called the Normalized CMA (NCMA). 


3. SOLUTIONS OF THE CONSTANT MODULUS 
COST FUNCTION AND ITS NATURE 

In a general environment, the snapshot x[n] will be 
composed of two terms, 

x[n] = Xd„s„[n]+n[n] (5) 

in=I 

where M is the number of incoming signals, dm is the 
generalized steering vector (including possible 
multipath effects), Sni[n] represents the m-th signal and 
n[n] models the noise (usually thermal noise) generated 
at the sensors. Under this assumption, y[n] can be 
rewritten as: 

y[n] = X w"d„s„ [n]+ w"n[n] = [n]+ny[n] 

m=l m=l 


The set of definitions g^ = w“dn, (m = 1..M) and ny[n] = 
w”n[n] is implicit in the above equation. Susbtituting 
(6) into (1) and manipulating the expression, we can 
finally find: 

J = f(gi-gM.k,...kM,cT,...crM,Pj (7) 

where (m = 1..M) represents the kurtosis^of the m-th 
signal, Cm (m = 1..M) is the standard deviation of the m- 
th signal, and P„ is the noise power at the array output, 
Pn=w“R„„w (8) 


To determine the behavior of the CM algorithm we 
need to find the extrema of J, solving the following 
vectorial equation: 


V„J = 0 (9) 

However, taking into account expression (7) and the 
relationship between the set of coefficients (gi...gM,Pn) 
and the weight vector w, it is possible to apply the chain 
rule to equation (9) obtaining: 


V j = Y_^V g -t-^V P = 
ai j aj _ „ 


( 10 ) 


Equation (10) has two sets of solutions: 

1. If dJ/dPa = 0, then the first term must also be zero, 
but if we assume that the generalized steering 
vectors dm are linearly independent, the only valid 
solution is then given by 9J/9gm = 0 for all m. It is 
possible to demonstrate that this condition implies 
gm = 0 for all m, and consequently, the weight vector 
associated to this solution lies completely in the 
noise subspace. Also, it is not difficult to 
demonstrate that this solution is a minimum if and 
only if all the incoming signals show a kurtosis 
larger than two, which is not the case for 
communications signals. Thus, this solution may be 
catalogued as an unwanted one. 

2. If 9J/9P„ 9 * 0, then we can rewrite eq. (8) to read as 
follows: 


w = - 


m=lO^ 


A. 


M 


= Ec„R‘'d„ 

jLmt m nn m 


(H) 


* This property is shared by many manmade ^The kurtosis of a signal is defined as the quotient between 
communications signals (i.e.: PSK and FSK modulations, its fourth order momentum and the squared second order 
among others). momentum. 
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From observation of eq. (9) we can conclude that 
this set of solutions result in a linear combination of 
the eigenvectors associated to the signal subspace. A 
special case of interest is encountered under the 
classical hypothesis Rnn = ctn^I. In this situation, the 
desired solutions of the CM algorithm are linear 
combinations of the generalized steering vectors of 
the incoming signals. 

4. THE PROPOSED ALGORITHM 

From the performed analysis it is obvious that we can 
directly avoid unwanted solutions if the number of 
sources present in the signal scenario is 
"approximately" known. We have quoted the word 
"approximately" because our proposal does not need to 
know exactly the number of incoming sources. It is 
enough to provide information about the maximum 
expected number of simultaneous sources. We will 
denote this number by S (S ^ M). 

Under this condition the proposed algorithm is 
summarized as follows: 

1. Estimate R„=E{x[n]x“[n]} from the received data. 

2. Solve the generalized eigenvalue problem described 
by RxxVi = A^R^Vj for all i=l..N, being N the number 
of sensors. 

3. Extract the N-S less significant eigenvectors and 
form the new matrix: 

V = [vs,„Vs,j.v^] (12) 

where we assume that the eigenvectors v; have been 
previously normalized in step 2. 

4. Solve the linearly constrained problem: 

min J = E|||y[n|* -ij | subject to w“R„„V = 0 

(13) 

where the introduction of a set of N-S restrictions 
will improve the convergence rate of the adaptive 
process. 

The developed algorithm is designed to work over 
blocks of data rather than on a sample by sample basis, 
although it is possible, if required, to follow an adaptive 
procedure for obtaining the eigenvectors of the data 
correlation matrix [4]. 


5. REFORMULATION OF THE PROPOSED 
TECHNIQUE AND THE GSLC 

Equation (13) yields a constrained optimization 
problem. A first approach is to employ the Frost 
algorithm, preprocessing all snapshots to remove 
components lying in the subspace spanned by V. 
However, this approach does not exploit the subspace 
rank reduction to reduce the number of computations. A 
general solution to the optimization of the Constant 
Modulus Cost Function given some linear restrictions 
was given by Griffiths[5]. However, in our special case, 
as all restrictions are equal to zero the formulation can 
still be simplified. By having in mind the structure of 
the GSLC beamformer, shown in figure 1, and taking 
into account how it works, it is clear that the simplest 
possible choice for Wo is: 

Wo=0 (14) 

avoiding the need to perform any computation related 
with the upper branch of the beamformer. The blocking 
matrix B must be orthogonal to the restrictions. If, 
under typical conditions, the noise is assumed to be 
uncorrelated between sensors, having equal power in 
all of them, the orthogonality condition for B can be 
written in terms of V as: 

B"V = 0 (15) 

Thus, the columns of B must lie in the signal subspace, 
and B can be chosen as: 

B = [v,,Vj,...,Vs] (16) 

where Vi is the i-th most significant eigenvector of R**. 



•• I_1 

#N #S 


Figure 1 - Structure of the GSLC beamformer 

Steps 3 and 4 of the algorithm proposed in section 4 
must be modified according to the new formulation. 
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3. Extract the S more significant eigenvectors and form 
the matrix B as described in eq. (16). 

B = (17) 


4. For every new snapshot, x[n]. 

Project it over the signal subspace to obtain z[n], 

z[n] = B”x[n] (18) 


Update the weight vector Wa[n] following: 


wJn + l] = wJn]-^„ 


(W-j) /[n] , , 

H4 


(19) 

where the normalized version of the CMA is preferred 
for the adaptation process. 


6 . SIMULATIONS 

The signal scenario chosen is shown in table 1. The 
selected array is linear, having 30 omnidirectional 
sensors. Distance between two of them is half 


# of signal 

Signal type 

Angle of arrival 

Input SNR 

n\ 

4-PSK 

o 

0 i 

-5 dB 

#2 

8-PSK 

0° 

-5 dB 

#3 

Tone, f=0.1 

20" 

10 dB 


Table 1 - Signal scenario for the simulations 


In figure 2 we can observe the evolution of the output 
SINR for both algorithms, proposed and classical 
NCMA, when they are optimally initialized. The signal 
subspace method is several times faster than its 
unconstrained version. 



Figure 2 - Evolution of the output SINR for both algorithms: 
proposed vs. NCMA 


It is difficult to notice, in the representation of the 
output SINR, the evolution of the weight vector and the 
final misadjustement. The proposed technique also 
achieves more precise reception patterns than the usual 
NCMA. This fact is shown in figure 3, where both 
algorithms are initialized to the optimum weight vector. 
The plot shows the error, computed as the distance 
between the optimum and the actual weight vectors for 
both algorithms. 



Figure 3 - Distance between the weight vectors and the 
optimum 


6. CONCLUSIONS 

Through the analysis of the solutions of the Constant 
Modulus Cost functions we have developed a modified 
version of the NCMA algorithm, which exploits the fact 
that the optimum weight vector lies on the signal 
subspace of the autocorrelation matrix of the snapshots. 
The proposed technique speeds up the evolution of the 
adaptive beamformer towards the optimum solution, 
showing better properties once the algorithm has 
converged. 

Although the computational load of the proposed 
method is higher than in NCMA, there are several 
interesting cases where the suggested algorithm is 
specially useful: 

• when the data set available for beamforming is small 

• for arrays composed by a great number of sensors 

• when convergence speed becomes a critical 
parameter in spite of computational load 

In a forthcoming paper we will make a full detailed 
computational balance of both methods, obtaining 
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expressions for the excess of error introduced in each 
case. 
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ABSTRACT 

The performance of the recently developed multi-user 
receiver and conventional receiver for narrow-band com¬ 
munication over fading channels with co-channel inter¬ 
ference (CCI) and diversity is investigated with non 
perfect channel state information (CSI) estimation. The 
CSI estimates are derived using pilot symbols. Two 
estimation strategies are used, one based on interpola¬ 
tion, one on per-survivor-processing (PSP). 

The performances of the receivers are assessed by com¬ 
puter simulations with uncoded and trellis-coded mod¬ 
ulation. Simulations with perfectly coherent detection 
are also shown for comparison sake. 

The results demonstrate that much of the advantage 
of the multi-user receiver is lost with this kind of CSI 
estimation. However, it still outperforms the conven¬ 
tional receiver, and it is less sensitive to the estimator 
used. 


1. INTRODUCTION 

In narrow-band communication with multi-beam cov¬ 
erage organization, frequency reuse (FR) is a key con¬ 
cept. The same frequency spectrum is used in different 
beams which are sufficiently spaced apart in order to 
maximize the number of users. The drawback of FR is 
that it introduces co-channel interference (CCI) which 
is generally the major source of impairment in cellular 
systems. Among the system solutions that have been 
proposed to counteract these channel impairment are 
channel coding, diversity and a combination of both. 
Recently, some concepts from multi-user communica¬ 
tion have been investigated in the context of coded 
transmission over fading channels with diversity (see 
e.g. [1],[2] and [10].) Until now, the channel state in¬ 
formation (CSI), i.e. the fading process affecting the 

This work was in part supported by the Human-Capital and 
Mobility Program of the European Union. 


signals, has been assumed available. In a real system, 
the CSI has to be estimated, introducing errors which 
degrade the system performances. 

In this paper we show how the performances are af¬ 
fected by the channel estimation using pilot assisted 
modulation. These methods have recently received much 
attention. Two different approaches are based on in¬ 
terpolation (see e.g. [3] and [4]) and on per-survivmg 
processing (see [5], [6], [7] and [8]). The 
use only the periodically inserted known pilot symbols. 
The latter exploit both data and pilot symbols. It is 
based on a trellis algorithm and on linear prediction, 
involves making separate fading estimates for the data 
sequences associated with the survivors. As a result, 
the channel estimator has two outputs; data decisions 
and corresponding channel estimates. 

The paper is organized as follows. A description of the 
system model is given in section 2. The receiver struc¬ 
tures are presented in section 3 and the channel state 
estimators in section 4. Some results from simulations 
are presented in section 5. Both coded and iincode 
PSK are covered. Finally, the conclusions of this study 




2. SYSTEM MODEL 

The block diagram of the transmitter is shown in Fig¬ 
ure la. Random data are encoded and past through 
an interleaver. A multiplexer inserts known pilot sym¬ 
bols at the rate Imn, i.e. one pilot symbol for each 
group of nn - 1 information bearing symbol. The re¬ 
sulting sequence is fed to the pulse shaping filter with 
unit impulse response and transmitted over M chan¬ 
nels. We consider a ?-PSK based communication sys¬ 
tem, i.e. the outputs of modulator takes on values from 
the set X, = f = 0,1, • • ■, <7 - !}• 

The transmitted signal x(t) passes through M fading 
channels with CCI and additive white gaussian noise^ 
The different channels are assumed uncorrelated, and 
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Figure 1: System model (a) The transmitter, (6) the 
channel and (c) the receiver 

the CCI signals are assumed to be of the same kind M 
the useful signal. In Figure lb a single signal is indi¬ 
cated for simplicity. 

At the receiving end, the signals are captured by M an¬ 
tennas and goes through matched filters. The sampled 
M-vector output at time k can be expressed as. 

y* = Sk^k + XI 

i=i 

where is the additive noise and g* is the fading 
affecting the useful (j = 0) and the interfering chan¬ 
nels (j = 1,..., N). These are complex Gaussian ran¬ 
dom vectors with mean zero and covariance ^Im and 
i-'YjlM, respectively (where li denotes the £ x £ iden¬ 
tity matrix). (for j = denotes the co¬ 


channel transmitted interfering signals. The useful sig¬ 
nal is not affected by intersymbol interference (ISI). In 
this paper, we have assumed that the CCI is symbol 
synchronous with the useful signal. 

We consider normalized diversity ([9]). This consists 
of splitting the total energy among the M diversity 
branches for the useful signal as well as for the inter¬ 
fering signals. We then have the expression: 

= ( 2 ) 

where Tj is the total average energy of the y-th sig¬ 
nal. The signal-to-interference ratio (SIR) is for the 
yth interfering channel given by: 

Pj = To/7i = ro/Fj (3) 

The received vector y* can be written in a more com¬ 
pact form as: 

Yk = Gifeb* -f n* (4) 

where b* = (xk,4ki’ •• G* is a M x (iV +1) 

matrix whose y-th column is gj. 

Depending on the channel estimator, the pilot symbols 
might be separated by a demultiplexer. This is only 
the case using an interpolator. Using the PSP estima¬ 
tor, the received signals follow two parallel lanes. One 
branch is fed to a channel state estimator which gener¬ 
ates estimates (i = the fading samples. 

In the case of the multi-user receivers, it is necessary 
to estimate the fading affecting the interfering signals 
as well. This is not shown in the figure for simplic¬ 
ity. In order to be able to obtain the estimates of the 
fading affecting the interfering signals, we assume that 
the SIR is inversed in the corresponding estimator. In 
practical terms, we might think of two beam-formers, 
each pointing at users in different locations. 

The other branch is delayed to be in step with the chan¬ 
nel state estimates, and after deinterleaving fed to the 
receivers together with the fading estimates. The out¬ 
puts of the receiver are branch metrics which are fed 
to the Viterbi decoder. 

3. RECEIVER STRUCTURES 
3.1. CONVENTIONAL RECEIVER 

This receiver is formed by a linear maximal ratio com¬ 
biner followed by a branch-metric computer and by 
a Viterbi decoder matched to the coded modulation 
scheme chosen. The combined channel output ate in¬ 
stant k is given by [9]: 

u = (g2)Vt (5) 
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where t denotes Hermitian transpose. The combined 
output r/t is fed to a metric computer, based on the 
Euclidian metric: 

m{rk,Xk) = 2 Re {rfcxjj} (6) 

which is maximum likelihood (ML) in the absence of 
CCI. The set of branch metrics {m{rk^Xk), Xk ^ is 
finally fed to the Viterbi decoder. This receiver requires 
CSI, timing and carrier phase recovery for the useful 
signal only. 

3.2. MULTI-USER RECEIVER 

This receiver structure needs the CSI, the timing and 
the carrier recovery for the interfering signals as well 
as for the useful signal. For each a € we define the 
set of {N -h l)-vectors 

= = (7) 

Under the assumption of ideal interleaving, given the 
value of its first component a the random vector b 
is conditionally uniformly distributed over 5(a). It is 
shown [10] that the ML branch metric for the Viterbi 
decoder is given by: 

=log Yi exp(nfc(bfc)) (8) 

bfc€<S(xjii) 

where: 

fi*(b*) = 2 Re {b^G^y*} - bjHfcbfc, (9) 

and H* = is the instantaneous (JV+1) x (Ar+1) 

correlation matrix of the fading vectors. Metric (8) can 
be well approximated, for sufficiently high SNR, by the 
simpler metric 

m'(r/fe,xjb) =^max ^{hk) (10) 

With perfect CSI, this receiver achieves the same BER 
curve slope as interference-free transmission. 

4. CHANNEL ESTIMATION 
4.1. INTERPOLATION 

In a general manner, the estimated fading samples can 
be defined as the output of a filter bank: 

K 

~ — hjtSnnjO K k K flTl (11) 

<=-K+l 

where = {hk{-K+l) • * • hk{K))'^ are the filter coef¬ 
ficients, and gnn = {9{-K+l).nn * * * 9K nnV ^^6 vec¬ 
tor of the fading samples afecting the pilot symbols. 


We consider two different interpolation techniques; sine 
interpolation and optimal interpolation. The filter co¬ 
efficients are given by: 

hfc = Hj-nnWfc (12) 

h? = (R,.„„+<T2l)-lwfc (13) 

respectively, where Rp.„„ = £[g„„g^„] and = E[s„ngk\- 
We see that for the optimal interpolator, we need to es¬ 
timate the autocorrelation function of the fading sam¬ 
ples affecting the pilot symbols. For the sine interpo¬ 
lator, this is not the case. However, the length of the 
filters are infinite. In the simulations, the filters are 
truncated to the length 512, giving negligible degrada¬ 
tion in performance. 

In the calculation of the optimal interpolator, we need 
the noise variance as well. As pointed out in [4], the 
CCI can be approximated as Gaussian noise. The re¬ 
sulting is therefore the sum of the interference power 
and the additive Gaussian noise. 

4.2. PER-SURVIVOR-PROCESSING (PSP) 

Recently, channel estimation using per-survivor-processing 
(PSP) has received some attention [6], [7]. Several vari¬ 
ations has been proposed. In this paper we limit our¬ 
selves to the technique called Basic Algorithm (BA) in 
[ 6 ]. 

The estimates of the channel state are obtained using 
linear prediction and tentative symbol decisions: 

L 

si = = 1 (14) 

n=l 

where c = (co • • • c^-i)^ minimize the mean square es¬ 
timation error. They must satisfy the normal equations 
[ 12 ]: 

L-l 

Rp(m - l)cj = Rj,(m + 1),0 < m < L - 1 

*=o 

(15) 

where Rp(A:) is the autocorrelation function of the fad¬ 
ing samples. The index i in ( 14) corresponds to one 
particular survivor in the trellis. The §^’s are then used 
in the branch metric computation: 

»^(4-i ^ 4) = Is* - sl^P (16) 

where m(a:^_i -> x^) denote the branck metric from 
state i to the state j in the trellis. 

Instead of using the channel estimates directly, we 
use the tentative symbol decisions and derive new chan¬ 
nel estimates from them. This is done with a Wiener 
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K 

n=-K 

where dn satisfy the equations: 

K 

R^(Tn-l)di + cT^dm = +1)» ^ ^ ^ ^ 

(18) 

The reason why this is done, is to involve an interpo¬ 
lation over past and future channel measurements and 
not only a prediction from past measurements. 

We have assumed that the predictor/interpolator co¬ 
efficients can be optimally chosen according to { 15) 
and ( 18). As they depend on the noise variance and 
the autocorrelation function of the fading which are 
non-stationary, these parameters have to be estimated 
as a function of time. Solutions to this problem are 
indicated in [6]. The PSP algorithm implemented per¬ 
forms well for QPSK signals. For 8PSK signals the 
performance is not so good. This is due to the fact 
that the error rate in the tentative symbol decisions is 
higher, perturbing the prediction of the fading samples. 
Other more complex algorithms are reported to work 
well even on 8PSK. They have a more complex trellis 
structure, more complex branch metric computation or 
a combination of both. 


5. SIMULATION RESULTS 

In this section we report on the bit-error-rate (BER) 
performance of the conventional and the multi-user re¬ 
ceiver with non-perfect CSI. We compare these results 
with the ones corresponding to perfect coherent detec¬ 
tion (perfect CSI). The two cases of uncoded QPSK and 
coded 8PSK are considered. The coding used is Unger- 
boek's 8PSK TCM scheme with 8 states. The diversity 
order is 2, and the Doppler bandwidth BdT = 0.01. 
The frame length nn is 5, giving a loss of effective 
Eft/iVO of 0.97 dB. This has been taken into account 
by shifting the BER curves 0.97 dB to the right. In 
order to obtain a benchmark for the performances, the 
curves corresponding to perfect CSI are shifted to right 
as well. When coding is implemented, a block inter¬ 
leaver of size 15 X 15 is used. 

For the uncoded case, the signal to interference ratio 
(SIR) is set to 20 dB. The estimators used are the sine 
interpolator and the PSP estimator. The performances 
of the two receivers are very close for small Eh/No in 
the ideal case (see Figure 2 and 3). However, the con¬ 
ventional receiver exhibits an error-floor, which is not 


the case for the multi-user receiver. This is true for 
any finite SIR. With non perfect CSI estimation, both 
receivers exhibit error-floors. With the PSP-estimator, 
the performances are very close for the two receivers. 
We see however that the conventional receiver is more 
sensitive to the estimator. The error-floor of conven¬ 
tional receiver is twice as high as the one of the multi¬ 
user receiver using the sine interpolator. 

For the coded case, the SIR is 10 dB. The estimators 
used are the sine interpolator and the optimal interpo¬ 
lator. With the optimal interpolator, the performances 
of the two receivers are very close (see Figure 4 and 
5. With the sine interpolator however, the multi-user 
receiver performs much better, showing that it is less 
sensitive to bad channel estimates. 

6 . CONCLUSIONS 

The performances of the conventional and the multi¬ 
user receiver for PSK signals transmitted over flat Rayleigh 
channels with CCI and diversity have been shown with 
non perfect CSI estimation using pilot symbols. For 
perfect CSI the multi-user receiver outperforms the con¬ 
ventional receiver and is especially efficient for low SIRs. 
The simulation results show that much of this advan¬ 
tage is lost when the CSI is estimated using the pro¬ 
posed techniques. It can be noted however, that the 
multi-user receiver is less sensitive to bad channel esti¬ 
mates than the conventional receiver. 

It is clear that the channel estimates using pilot sym¬ 
bols will be poor for low SIRs. This is due to the fact 
that the interference is seen as additive Gaussian noise 
by the CSI estimator. One way to avoid this problem 
is to use non overlapping pilot tones. This strategy in¬ 
troduces other problems as spectral lines in the trans¬ 
mission band, but should be investigated for this kind 
of systems. 
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Figure 2: BER for the conventional receiver with un¬ 
coded QPSK, SIR=20 dB, B^T = 0.01 and nn = 5. 



Figure 3: BER for the multi-user receiver with uncoded 
QPSK, SIR=20 dB, B^T = 0.01 and nn = 5. 



Figure 4: BER for the conventional receiver with coded 
8PSK, SIR=10 dB, BaT = 0.01 and nn = 5. 



Figure 5: BER for the multi-user receiver with coded 
8PSK, SIR=10 dB, BrfT = 0.01 and nn = 5. 
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ABSTRACT 

The Cerebellar Model Articulation Controller (CMAC) 
is attracting a great deal of interest due to its on-line 
rapid learning, generalisation properties and simplicity. 
In this paper an analysis of the abilities of CMAC and 
the Generalized GCMAC (GCMAC) to approximate 
an arbitrary input/output mapping is addressed. An 
expression for the dimension of the null space spanned 
by the set of functions provided by the GCMAC is de¬ 
rived. From this expression it is possible to measure 
the space of functions that can be exactly modelled by 
the GCMAC. A set of local restrictions on the surfaces 
that can be exactly approximated is given. As it was 
expected, the local restrictions imply that only smooth 
surfaces are adequately stored. 

1 . INTRODUCTION 

The CMAC network [1] was proposed as a control me¬ 
thod bsised on the principles of the cerebellum ^s motor 
behavior. The simplicity of the CMAC learning al¬ 
gorithm and its ability to generalise from sparse input- 
output data pairs has led to its widespread use in many 
engineering applications, e.g. prediction of chaotic time 
series [2], real-time robotic control [3] and nonlinear 
deconvolution [4]. However, in many applications in 
digital communications such as data predistortion [5, 
6], electrical echo cancellation [7] or nonlinear equal¬ 
isation, the accuracy in the approximation is a more 
important factor than generalisation. Consequently, 
evaluation of the function representation capabilities 
of CMAC is an important issue. Previously, only pre¬ 
liminary studies about the interpolation capabilities of 
CMAC have been reported [8]. One approach is to in¬ 
vestigate the hyper-surfaces that can, and cannot, be 
modelled exactly. Thus, in this paper we present a 
study of the dimension of the space spanned by the ba¬ 
sis functions of the Generalized CMAC (GCMAC) [9]. 
This can be considered as a bound on the space of the 
range of possible functions which GCMAC can model. 


Also, we identify certain features of the functions which 
the GCMAC is unable to model. 


2. DISCRETE HYPER-SURFACE 
APPROXIMATION 


General approximation theory deals with the problem 
of approximating a multivariate function z = F(x) by 
a function z = H(w, x) having a fixed number of pa¬ 
rameters (weights), w. Spline interpolation and many 
approximation schemes, such as expansions in series of 
orthogonal polynomials, are included in this represen¬ 
tation. In particular, the CMAC network provides a 
set of basis functions which, when suitably adjusted in 
amplitude, approximate the desired hyper-surface. 


X -..A 










Figure 1: CMAC Input/Output Mapping using binary 
basis functions. 


The GCMAC internally transforms every training 
input into a higher dimensional space so that the de¬ 
sired output can be made approximately linear to the 
transformed input. In this way, the output of GCMAC 
is formed by a linear combination of overlapping ba¬ 
sis functions which are distributed in an n-dimensional 
subspace of Z”, 


p 

£ = ( 1 ) 

i=i 

where c(j) contains the indices of the local functions 
activated by the input vector ( e.g., in Fig. 2 if x = 
(l,l)->c=(l,5,9)). 


Ill 



Since the approximation used by the GCMAC net¬ 
work is linear in the unknown coefficients w, there al¬ 
ways exists a choice of weights that approximates £(•) 
better than all other possible choices [10]. In this sense, 
simple instantaneous learning laws can be used, for 
which convergence can be established subject to well- 
understood restrictions. 

3. THE GCMAC SCHEME : DEFINITION 
OF THE SET OF BASIS FUNCTIONS 

The set of basis functions shown in Fig. 2 can be ge¬ 
ometrically decomposed into a set of pMAX overlays. 
An overlay is defined as the union of basis functions 
with non-overlapping hyper-rectangular receptive fields 
which exactly covers the discretized input space. The 
size of hyper-rectangles is defined by the generalisa¬ 
tion vector £ = being pMAX = ma.x{p). 

These overlays have different partitioning of the recep¬ 
tive fields so that the same input maps to different ele¬ 
ments of the set of basis functions in different overlays. 
An example of the overlay structure is given in Fig. 2 
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Figure 2: CMAC overlay structure: p = [2,3] 

The generalisation vector p significantly affects the 
approximation capability and the rate of convergence of 
the network. When elements of £ are chosen too large, 
the network is slow to learn a function containing high 
spatial Fourier components. On the other hand, when 
the elements of p are chosen too small, the network is 
unable to generalize between neighboring training sam¬ 
ples. An heuristic rule to determine the generalisation 
vector can be stated as follows. A small width in the 
receptive fields must be used when the function varies 
significantly, while a larger one should be used when 
the function is approximately constant. In addition, 
the previous strategy implies a substantial improve¬ 
ment of the rate of convergence. As an example, Fig. 
3.a shows a deterministic surface and Fig. 3.b some 
of the functions employed in the approximation with 
a generalisation vector p = [2,15]. The convergence 
rate obtained with different configurations of GCMAC 
is shown in Fig. 4. 



Figure 3: a) Original surface, b) Some basis functions 
of a GCMAC with £ = [2,15] 



Figure 4: Curve 1: Standard CMAC (£ = [15,15])- 
Curve 2: Standard CMAC {£ = [2,2]). Curve 3: Pro¬ 
posed GCMAC (£ = [2,15]). Curve 4: Proposed GC¬ 
MAC {p = [15, 2]). Traces are ensembled average of 15 
convergence curves. 


From Fig. 4 it can be inferred that the approxi¬ 
mation capabilities of GCMAC improve when the gen¬ 
eralisation vector is reduced. In particular, for a two 
dimensional input space, the approximation capabili¬ 
ties of GCMAC only depends on the minimum element 
{pmin) of the generalisation vector whereas the po¬ 
sition of Pmin within £ does not affect to the final 
approximation error but to the rate of convergence. 

4. REPRESENTATION CAPABILITIES OF 
GCMAC 

Many neural networks can approximate, under some 
typically very reasonable conditions, an input-output 
map arbitrarily well (given infinite resources). In this 
work, we are interested in answering to the next ques¬ 
tion: for a given GCMAC structure (finite resources), 
what kind of functions can be modelled exactly? In 
general, GCMAC cannot reproduce an arbitrary mul¬ 
tivariate Look-Up-Table (LUT) z = F(x)- This can be 
considered as an upper bound in the approximation ca- 


112 






pabilities of the GCMAC. Brown et al. [8] have proved 
that linearly combined univariate piecewise constant 
LUTs ( z = Fi{xi)) can be exactly approximated 

by the CMAC, for any value of £. This class of func¬ 
tions are the lower bound of the set of functions which 
are exactly modelled by a GCMAC. Between these two 
bounds is the answer to our question. 

In discussing the modelling capabilities of GCMAC 
it is necessary to answer the question of whether a 
multivariate function can, or cannot, be exactly rep¬ 
resented. For this reason, it is useful to specify the 
number of degrees of freedom wasted for select the best 
set of weights or, in other words, the number of linearly 
dependent basis functions of GCMAC. Once this bound 
on the approximation capabilities is set, the next step 
is to derive a set of relationships which must exist in the 
data for an exact representation. Having described the 
set of restrictions in the input data, it is then possible 
to construct functions which fail to satisfy any of the 
restrictions. These functions are said to be orthogonal 
to each of the basis functions of GCMAC. 

5. SPACE OF FUNCTIONS 
A matrix interpretation of GCMAC Input/Output map¬ 
ping is given by: 

where A is a TV x M matrix for which each row is the 
assigned association vector, ^, to every input vector x, 
N is the cardinality of the set of input integer vectors 
(if the number of integer values along i dimension is 
Li, then N = 17”=! ^ number of basis 

functions of the GCMAC. Starting with the matrix A, 
which has entries in 0 = [0,1], one may want to know 
how many columns (or rows) of this matrix are non¬ 
parallel or independent of each other. The GCMAC 
solution matrix A is rank-deficient when pi ^ 2 [8] 

. This means that the set of functions provided by 
the GCMAC is linearly dependent. Assuming that the 
rank of A is R< M < N, & unitary matrix U can be 
chosen in such that the /^-dimensional column 

space of A spanned by a subset of Ft columns of U, 
say the first R columns, which together form the matrix 
U,then 

M=(nn'^) (3) 

Since U is unitary, any vector z can be decomposed 
into two mutually orthogonal vectors z and in the 
spaces spanned by the columns of U and their orthog¬ 
onal complement U**". In this sense, the space of func¬ 
tions that can be exactly modelled by the GCMAC is 
spanned by the columns of matrix and the func¬ 
tions that cannot be modelled by the GCMAC are in 


the space spanned by the columns of ^ . 

Same comments hold for the row space of matrix 
A, and a unitary matrix V, of size M x M, can be 
similarly found and decomposed into two orthogonal 



The columns of span the null space (or kernel) of A 
i.e., the space of weight vectors w for which A w = 0. 

5.1. DIMENSION OF THE NULL SPACE OF 
GCMAC 

The dimension of the null space of the linear operator A 
determines the number of degrees of freedom wasted for 
the choice of the best set of weights. The computation 
of the dimension of the null space of A is tedious and 
strongly dependent on the parameters which specify 
the GCMAC and, in particular, on the generalisation 
vector p and the number of levels in each axis, L,-. As 
an example, for n = 2 (two-dimensional input space), 
and a generalisation vector ^ {p\ ^ fhe 

dimension is given by: 

dim(ker(A)) = ^ (j ^ ^ (5) 

i=i 

where dij = mod{j — ItPi) + !• When pi = />2 = P 
(standard CMAC), the dimension of the null space is 
^—1. This result is in agreement with the fact that the 
number of common weights shared by adjacent input 
points is just p — 1. 

5.2. DEPENDENCE RELATIONSHIPS OF 
THE DATA 

Once the number of linearly dependent basis functions 
of GCMAC is determined, the rank of A can be evalu¬ 
ated: 

R = rank(A) = M ~ dim(ker(A))- (6) 

It is evident that the GCMAC is only able to ex¬ 
actly model hyper-surfaces with a certain number of re¬ 
strictions on their values. In particular, there will exist 
N - rank(A) non-redundant equations which specify 
the relationships which must satisfy the desired func¬ 
tion to be approximated. For clarity, a geometric in¬ 
terpretation of the previous relationships is presented 
below. 

5.2.1. GEOMETRIC INTERPRETATION 
Consider an n-dimensional lattice which is composed of 
n-dimensional hyper-rectangles (receptive fields). The 
width of the hyper-rectangles along the ith axis is p,*. 
Let F(x) a function which has a zero value for all the 
inputs lying outside of the squared block composed of 
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overlay 2 


four adjacent points of the n-dimensional input space 
as shown in Fig. 5. 



overlay 1 


overlay 3 


overlay 4 



Figure 7: The block of data does not lie on a knot of 
the GCMAC overlay structure. 


Figure 5: A two-dimensional block of data. 


2^3 ~ 1^6 + " i " ^^22 + u ;32 

Z4 = we + Wir 4 - W22 + 1^32 ( 7 ) 


Depending on the relative position of the block within 
the lattice two possible situations can occur: 

1. When both (n—l)-dimensional hyper-planes which 
cut the block He on the same overlay (the block 
lies on a knot of the lattice), there are enough 
degrees of freedom to choose the weights of the 
GCMAC. Therefore, the GCMAC is able to ex¬ 
actly represent the desired output for the region 
defined by tlie block (see Fig. 6). 


overlay 1 overlay 2 overlay 3 overlay 4 



Figure 6: A two-dimensional block of data lying on 
knot of the GCMAC overlay structure, p = [3,4], 
.• = [ 8 , 8 ], 

2. However, when each of the two (n—l)-dimensional 
hyper-planes which cut the block lie on differ¬ 
ent overlays, the GCMAC is only able to exactly 
model the desired hyper-surface when certain re¬ 
strictions in the data are imposed. The outputs 
corresponding to the block showed in figure 7 are 
given by: 

Zi =W6 + Wig + 1^25 + t^32 
Z2 = W6-\- Wi7 -h W2^ -f W32 


From this expression, the following relationship 
must be verified by the desired output: 

Zi-Z3 = Z2-Z4<^Zi-\-Z^=Z2-\- Z 3 ( 8 ) 

The number of blocks which are not corner-aligned 
with the knots of the lattice determines the num¬ 
ber of restrictions required to exactly model the 
desired hyper-surface. This number is equal to 
N — rank(^). 

It is important to note that when the desired hyper- 
surface does not satisfy the equation 8, the GCMAC 
is not able to exactly model the function, i.e. some 
amount of error is produced when GCMAC performs 
the approximation (the GCMAC reproduces the de¬ 
sired function in a least squares sense). The restrictions 
required to exactly model a specific hyper-surface can 
be roughly condensed in the following heuristic rule: 
the slope of the hyper-surface at adjacent points (dif¬ 
ference between neighbor points) must be (nearly) the 
same. This rule has an evident connection with the well 
known smoothness hypothesis required for performing 
general function approximation. 

5.3. ORTHOGONAL FUNCTIONS 
As stated above, the GCMAC is only able to repro¬ 
duce hyper-surfaces with certain restrictions. Restric¬ 
tions stem from the overlap between the receptive fields 
of the basis functions. It is obvious that, when the 
smoothness hypothesis is violated, the GCMAC will 
be unable to exactly reproduce the required hyper¬ 
surface. For this reason, it seems reasonable that the 
functions orthogonal to the basis functions of GCMAC, 
must have large spatial-frequency components. Follow¬ 
ing this line, we have found certain conditions that the 
orthogonal functions must verify. If we construct a 
function which has a zero value for all the inputs lying 
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outside the block shown in figure 3 and with values at 
the input points lying inside the block satisfying 

= Z 4 = 1 , 22 = *3 = ~ 1 > (®) 

then the GCMAC is unable to reproduce the desired 
function. After some straightforward algebraic manip¬ 
ulations it is easily derived that the best (in a least 
squares sense) vector of weights is identically zero, giv¬ 
ing the GCMAC a null output. Whenever a block sat¬ 
isfies equation 9, and whenever it is placed completely 
within the lattice, such that the two (u l)-dimensional 
hyper-planes which cut the block do not lie on the same 
overlay, the GCMAC will be unable to represent the de¬ 
sired function. Any linear combination of two or more 
functions like those specified in equation 9, will also be 
orthogonal to the basis functions of GCMAC. 

If the degree of generalisation is reduced along some 
determined direction, then the knot density on the lat¬ 
tice is increased and, therefore, the number of orthog¬ 
onal functions to the beisis functions of GCMAC is 
reduced. For particular choices of the generalisation 
vector Pi the number of function basis, Af, may be 
greaterlhan the number of input vectors, N. In this 
case, the linear system described in equation 2 is under- 
specified, that is, it has more unknowns (weights) than 
equations. In this context, the rank of ^ is equal to 
N and the GCMAC exactly models the desired hyper¬ 
surface. Even though the number of adjustable weights 
is greater than N^ i.e. there are more weights than in a 
full Look-Up-Table (LUT), the generalisation capabil¬ 
ity of GCMAC speeds up the rate of convergence with 
respect to the (less complex) LUT. 

6. CONCLUSIONS 

This paper has considered some features of the class of 
functions which the Generalized CMAC can and cannot 
model. As the GCMAC network generalises, a multi¬ 
variate LUT cannot be exactly approximated. A set 
of restrictions of the hyper-surface were derived for a 
perfect approximation. These restrictions can be in¬ 
terpreted as a simple condition of smoothness. There¬ 
fore, the GCMAC network, like the CMAC, is only 
able to represent smooth hyper-surfaces. However, the 
degrees of freedom of the hyper-surfaces that can be 
modelled by GCMAC are greater than those approxi¬ 
mated by the standard CMAC. This result stems from 
the flexibility gained with a variable generalisation vec¬ 
tor. Finally, a set of orthogonal functions to the basis 
functions of GCMAC were fo>md. 
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ABSTRACT 

The paper proposes a neural network technique to adap¬ 
tively model and characterize digital satellite channels. 
The neural network model allows to identify each com¬ 
ponent of the channel by the use of the channel input- 
output signals as learning data. This technique can 
be applied to failure detection in digital satellite links, 
especially those arising in on-board devices. The pa¬ 
per gives some simulation examples of changes which 
occured in the on-board filters. Our adaptive method 
allows to determine the origin of the changes and gives 
the new channel characteristics. 

1. INTRODUCTION 

Digital satellite channels are composed of linear (e.g. 
linear filters) and non linear (e.g. travelling wave tubes 
(TWT)) devices. Classical adaptive techniques used 
to identify these channels (such as Volterra series ap¬ 
proaches [1]) can give only a model for the channel 
input-output relationship, and are not able to charac¬ 
terize each component of the channel. When a failure 
happens in the satellite link (which may concern one or 
more on-board devices), it is impossible to determine 
its origin if we can not identify each component of the 
channel. 

In a recent paper [5], we have proposed a general 
structure, the adaptive non linear enhancer (ANLE), 
for non linear channel identification. The ANLE is 
an adaptive neural network structure which allows not 
only to model the global non linear channel input- 
output relationsliip, but also to characterize each com¬ 
ponent of the chaimol (the loariiiiig process is porfonnod 
by using the channel input-output signals). [3] and [4] 

THIS WORK HAsYeEN SUPPORTED IN PART BY THE 
FRENCH SPACE AGENCY (CNBS) UNDER CONTRACT 
962/94/CNES/1232/00. 


present other applications of neural networks to satel¬ 
lite communications. 

In this paper, we use an ANLE structure to model 
satellite channels equipped with TWT amplifiers. We 
analyze the capability of the neural network to model 
the channel components. This technique is then ap¬ 
plied to failure detection. We give some simulation 
examples of changes in the on-board filters characteiis- 
tics. Our adaptive method allows to locate the origins 
of the changes and gives the new characteristics of the 
channel. 

The paper is organized as follows. Part 2 describes 
digital satellite channels. In part 3 we present the 
ANLE structure and its application to the identifica¬ 
tion problem. The application to failure detection is 
given in section 4. 

2. DIGITAL SATELLITE CHANNELS 

A satellite channel consists of two earth stations con¬ 
nected by a repeater (satellite) through two radio links 
(uplink and downlink). As an example, consider the 
simplified scheme of figure 1 modelled in the complex 
base band. The transmission filter FO, the IMUX (in¬ 
put multiplexing) filter FI, and the OMUX (output 
multi-plexing) filter F2 are linear. The TWT acts as a 
memoryless nonlinearity with a complex transfer func¬ 
tion which depends only on the input complex enve¬ 
lope. It exhibits two kinds of non linearities, amplitude 
distortion (AM/AM conversion) and phase distortion 
(AM/PM conversion) [1, 7]. 



Figure 1: A simplified satellite channel. 
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The satellite channel used in this paper (figure 1) 
has the following characteristics. The signals are QPSK 
modulated, the transmission filter FO is a four-pole 
Chebychev with 3dB bandwidth FI is a four-pole 
Chebychev with 3dB bandwidth and F2 is a four- 
pole Chebychev with 3dB bandwidth The TWT 
AM/AM and AM/PM conversions are represented by 
Saleh model [7], 


Air) 


agT 

1 + Par"^ 


(pir) 


1 + Ppr"^ 


where r is the TWT input amplitude, aa = 2,/3a = 
IjCVp = 4.0033, and /3p = 9.104. 


non linear subnets. Their outputs G and (f) can be writ¬ 
ten as: 


Na 

G {p (n)) = ^ WG2kf {'^GlkP {n) + boik) + bG2 

k=l 
Np 

4>ip (n)) = X) ‘^P 2 kif iwpikPin) + bpik) 

-fibpxk)) 

where pin) = r^(n) = || 2 /(h)||^. Note that the origin 
of the pha.se is 0 by construction (0) = 0). 

The PNL output can be written as: 


3. CHANNEL IDENTIFICATION 

3.1. IDENTIFICATION STRUCTURE 

In a recent paper [5], we have proposed a general struc¬ 
ture, the adaptive non linear enhancer (ANLE), for 
non linear system identification. The ANLE is a neu¬ 
ral network structure which allows not only to model 
the global non linear system input-output relationship, 
but also to identify each component of the system (the 
learning process is i)erformed using the system input- 
output signals). 

In this paper, we use an ANLE structure to model 
satellite channels equipped with TWT amplifiers. We 
analyze the capability of the neural network to model 
the channel components. 

The ANLE structure of figure 2 is used to model 
the block F1-TWT-F2 (the satellite itself). The ANLE 
copies the structure to be identified (a memoryless non 
linear system between two linear systems): It is com¬ 
posed of a linear subnetwork (PLl, with 60 weights), a 
non linear subnetwork (PNL, with 18 scalar neurons), 
and a second linear subnetwork (PL2, with 60 weights). 
Note that the amplitude and phase conversions of the 
non linear subnetwork depend only on the input signal 
amplitude (as in the TWT). The learning procedure 
is performed by presenting to the neural net at each 
iteration a pair of the channel input-output complex 
signals. 

The ANLE works as follows. The first linear part 
(PLl) filters the complex-valued input x (n) (real FIR 
filtering), its output is then written as: 

Ni 

yin) = Yl wikX in -k + 1) 

k=l 

The two non linear subnetworks correspond to the gain 
(G) and phase (P) conversions, respectively. The squared 
amplitude p of the output y of PLl is presented to both 


We present to PL2 the vector z (n) = [;2 (n), z (n — 1)... 
z{n — N 2 -\- l)]^where N 2 is the memory of PL2. The 
output s (n) of the ANLE is then: 

Na 

^ {n)z{n — i'\^ 1 ) 

i=l 



Figure 2: Identification structure 


3.2. ALGORITHM 

We adjust the network weights by using a gradient de¬ 
scent algorithm which minimizes the squared error be¬ 
tween the channel output d (n) and the ANLE output. 
The squared error is written as: 

lie (n)||2 = ||d(n) - a (n)||2 = e% in) + ej (n), 

where R and I denote the real and imaginary parts, 
respectively. 

The PL2 weights are updated as: 

W 2 i (n + 1) = W 2 i (n) 4- p(zr (n ~ i + 1) Ssr (n) 
+z/(n - i + 1) <55/(n)), 

where Ssn (n) = sr (n) and Ssi (n) = e/ (n). 

We present now the updating rule of subnetwork G. 
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The second layer weights of subnetwork G are updated 
as: 

^G2 A: (tI + 1) = '^G 2 k (^) + 

where XQik is the output of neuron k of the first layer: 

^G\k = / {Tietoik) , netoik = 'waikP + hoik^ 

and 6 g 2 is an error term: 

6 g 2 = 2u;2i {^rzr + ejzj). 

The second layer bias term is updated as: 

hG2 (n + 1) = &G2 + P^G2^ 

The first layer weights are updated as: 

'^Glk (n + 1) = WQlk (n) + pVJG2k^G2pf {^^tG\k) • 

The first layer bias vector is updated as: 

^Glk (n + 1 ) = hGlk (^) + pyJG2kSG2f (netGik) • 

We now present the updating rule of subnetwork R 
The second layer weights of subnetwork P are updated 
as: 

wp2k (n -I-1) = wp2k (ti) 4 - pXpikSp2y 
where xpik is the output of neuron k of the first layer: 

s^pik = / i'lJ^pikp -f ^Pik) — / 0>Plk) i 

and Sp2 is an error term: 

Sp2 = 2 Gw2i (ej (“ sin (^) yp - cos ((j>) yj) 

-^ep (cos (</)) yp - sin (4>) yi)). 

The first layer weights are updated as: 
wpik (n + 1) = wpik {n)+pwp2kSp2pf {'li^Pikp + ^Pik) • 
The first layer bias vector is updated as: 
bpik (n 4 -1) = bpik (n) 4 - pwp2kSp2 

(f i'i^PikP'hbpifi) — f (bpik)). 

Finally, the first linear part (PLl) weight vector is up¬ 
dated as: 

mi (n 4 - 1 ) = wii (n) 4 - ^pixp (n - i 4 - 1 ) (2ypSy 
~\-Gw2i (cos (</)) ep 4- sin (<f>) e/)) 

+xj (n-i + l) {2yj6y 

+Gw2i (- sin {(j)) ep 4- cos (<56) e/))), 

where 

Na 

= SG2Y^Wa2kWc,\kf {net.G\k) 

k^\ 

Np 

+<^P2 {'^i^Plkp + bpxk) ■ 

A:=l 


It is worth noting that the above algorithm has the 
same properties as the classical backpropagation algo¬ 
rithm [2, 6] (error backpropagation, parallelism, etc.). 
Note that the two non linear subnetworks can be ad¬ 
justed in parallel and independently. 

3.3. SIMULATION RESULTS 

In figure 3 we compare the neural network output (gen¬ 
eralization) to that of the channel. The generalization 
MSE was 3.310“^ (RMS = 0.018, SNR = 33.3 dfl, 
the TWT working at saturation). Figures 4-7 show 
that each element of the channel has been correctly 
characterized by the corresponding part of the ANLE. 


Reul(output) 



Figure 3: Generalization performance. 


0 Amplitucie (normalized) 



Figure 4: Fi-equency response of FI and the NN 
model. 
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Figure 5: Frequency response of F2 and tlie NN 
model. 




Figure 7: AM/PM conversion (TWT and NN). 


4. DETECTION OF CHANGES IN 
SATELLITE CHANNEL 
CHARACTERISTICS 


The identiHcatiou technique is now applied to the de¬ 
tection of changes in satellite channels characteristics 
which may occur because of a failure. 

In the simulation below, a ’big’ change occured in 
filter FI (the other channel components were taken im- 
changed). Figure 8 shows the frequency response of FI 


before and after the change. By using the same ANLE 
structure as the above section, the adaptive system de¬ 
termined the origin of the change (filter FI) and gave 
the new filter frequency response (figure 9). 


AmplUude (normalized) 



, Amplitude (normalized) 



Figure 9: New filter characterized by PLl. 


In the simulation below, a ’small’ change occured in 
filter F2 (the other channel components were taken 
unchanged). Figure 10 shows the frequency response 
of F2 before and after the change. By using the same 
ANLE structure as the above section, the adaptive sys¬ 
tem determined the origin of the change (filter F2) and 
gave the new filter frequency response (figure 11). 

0 Amplitude 
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5. CONCLUSION 

The paper proposed a new approach for channel iden¬ 
tification and failure detection in digital satellite com¬ 
munications. For the identification process, we used 
the adaptive non linear enhancer (ANLE) structure [5] 
which allows to characterize the channel linear and non 
linear devices. The learning process is performed by us¬ 
ing the channel input and output signals (i.e. we do not 
have access to the different input-output signals of the 
channel components). The ANLE has few parameters 
(e.g. 18 scalar neurons were sufficient to model the non 
linear part). This technique was applied to failure de¬ 
tection in digital satellite links. The paper gave some 
simulation examples of changes which occurred in the 
on-board filters. Our adaptive method was able to de¬ 
termine the origins of the changes and gave the new 
characteristics of the satellite channel. 

6. REFERENCES 

[1] S. Benedetto, E. Biglieri, and V. Castellani, Digital 
Transmission Theory^ Printice Hall International, 
Englewood Cliffs, New Jersey, 1987. 

[2j S. Haykin, Neural Networks: a Comprehensive 
Foundation, IEEE Press, 1994. 

[3] M. Ibnkahla, J. Sombrin, F. Castanie, and N. J. 
Bershad, "Neural networks for modelling nonlin¬ 
ear memoryless communication channels”, submit¬ 
ted to IEEE TVans. Communications, 1995. 

[4] M. Ibnkahla and F. Castanie, "Neural networks 
for digital communications and signal processing: 
overview and new results”, In E. Biglieri and M. 
Luise Eds., Signal Processing for Digital Commu¬ 
nications, Springer Verlag, London, 1996. 

[5] M. Ibnkahla and F. Castanie, "Neural network 
identification of non linear channels; The adap¬ 
tive non linear enhancer”, In proceedings of 






ON THE USE OF DERIVATIVE CONSTRAINTS TO CONTROL BEAMFORMING 
RESPONSE SHAPES AGAINST INTERFERING DIRECTIONS 

Jacques Fois Pelayo, Jose M. Paez Borrallo* 

E.T.S.I. Tclecomunicacion, Universidad Politccnica de Madrid 
Ciudad Universitaria s/n, 28040 'Madrid 
e-mail: “jfp@serv 01 .rpi.ses.alQatel.es” 

*c-mail: “paez@gaps.ssr.upm.es” 


I. INTRODUCTION 

The use of beamformers with derivative constraints has 
been studied by several authors over the last two decades. 
These derivative constraints have been aimed at obtaining 
a flat magnitude response near the direction of interest, 
and the authors have given either sufficient conditions or, 
as for instance in [I], sufficient as well as necessary 
conditions in order to achieve the flat main beam response. 
The purpose of this work is to show that derivative 
constraints can be also applied to interfering signal 
locations as well as to the main beam in order to avoid 
that possible fluctuations of the interferers may 
significantly affect their locations. As a matter of fact, in 
some applications the desired direction is normally nearly 
fixed while the interfering locations might suffer 
variations. It is needless to say that the problem can be 
solved from another viewpoint by increasing the number of 
spatial null constraints over a region where it is supposed 
the interferers may come from. Although the results can be 
extended to the broadband case [1], the paper will consider 
only the narrowband beamformer structure. 

The structure of this paper is described as follows. After 
the introduction, section II contains a brief review of the 
beamformer problem basic concepts introducing the 
Generalized Sidelobe Canceller which is going to be used 
in the remaining sections. Next, in section III, the 
equations involved in obtaining the derivative constraints 
are mentioned and finally section IV describes the 
derivative constraints performance by the aid of computer 
simulations. 

II. BACKGROUND 

Let us consider an array of L isotropic elements 
distributed at known locations over the xyz space. The 
beamformer structure is such that the output at any time n 
is given by 

y(n) = w"x(n) (1) 

where x(n) is the data vector at time n and w the weights 
vector, while H denotes the matrix is complex conjugate 
transposed. The derivative constraints will be given with 
respect to the magnitude response with 6 and ^ 


being the elevation and azimuth angles respectively in the 
xyz space. The magnitude response is defined as 


F{6,(l)) = w^^ii|xx^^|w = 



L is the number of array sensors and r, the delay in sensor 
/ (i=l,2,...,L) due to the signal propagation with respect to 
the axis reference. The elements of the matrix are rtj = 

= expijGK^i-^)) and = (x/sina;os^^>/,sin^in^z,cos(^/c 
for any /, where {Xi^i,zi) defines the position of the sensor / 
and c is the speed of propagation of the wavefront detected 
on the array. 

The minimum variance beamforming problem with 
constraints is formulated minimizing the variance of the 
output >»(«) in (1) 

min w^R^x w ( 3 ) 

W 

subject to the following set of linear constraints 

C"w = f ( 4 ) 

where Rxx is the covariance matrix of Uie data vector x, C 
the constraint matrix and f the vector containing the 
magnitude and phase wanted at the output for every 
constraint. 

As it is well known, the solution to the problem is given 
by 

w,pt=RxxC(c*R;^c)‘‘f ( 5 ) 

The idea of controlling the magnitude response to achieve 
a flat main beam response is not new. Sufficient derivative 
constraints have been derived by Er & Cantoni [ 2 ] in 
obtaining this flat main beam response, and later, further 
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Ill.l.a ZF derivative constraints. 


studies by Er [3] and Tseng [1] have been appearing to get 
the conditions be also necessary. 

There is another way in depicting a beamformer structure 
different from the conventional shape. It can be easily 
found realizing that a particular solution to the minimum 
variance beamforming is the so-called quiescent solution 
extracted from (5) for the presence of only uncorrelated 
noise, and expressed as 

w, =c(c"c)'‘f (6) 

Figure 1 shows the beamformer structure known as 
Generalized Sidelobe Canceller (GSC), which was 
introduced by Griffiths & Jim in [4]. The optimum 


Consider the signal scenario is such that there is an 
interferer coming from 6i. Taking the first derivative 
function with respect to 0 of the magnitude response given 
by (2) yields the expression 

•^ = -y®(w"A5R„w-w"R„Aflw) (7) 

which we make equal to zero to gel the sufficient ZF linear 
derivative constraint, resulting 

C''(0)A^wL=O (8) 


x(«) 



Figure 1. Generalized Sidelobe Canceller. 


solution Wopt is then the sum of two vectors, one is the 
quiescent (only depending upon the constraints) from the 
upper branch and the other one is that resulting vector 
(depending on the data) from the lower branch, always 
referring to figure 1. The GSC structure with derivative 
constraints has been also studied in the past by Buckley & 
GrifTiths [5]. 

The intention of this work is to make use of the 
derivative constraints specified not in the look or main 
directions as usually but in the interfering steering 
locations too. 

in. OBTAINING THE SUFFICIENT LINEAR 
DERIVATIVE CONSTRAINTS 

Sufficient null derivative constraints of the magnitude 
response are derived for both linear array and circular 
array cases, and we will deal each case with two subcases, 
namely zero-plus-first (ZF) derivative constraints and 
zero-plus-first-plus-second (ZFS) derivative constraints. 

III. 1 Linear array case. 

Let us suppose the array is located along the z axis and 
the sensors are equally spaced by a half wavelenjgth ^2, In 
this case, the data in each sensor are independent upon the 
azimuth (f> and so the magnitude response. 


where 



dO 


^2 

de ■“ do 


( 9 ) 


We would like to comment a point before going on with 
the analysis. Equation (8) has been considered as a 
sufficient condition to make the magnitude response first 
derivative function equal to zero but the fact is that it is 
not really sufficient. In effect, we know that 

C^(^)w|^ =0 holds already because it corresponds to 

one of the spatial constraints, and this makes, specifying to 
Ok equation (7) be zero without any other additional 
constraint (unless we use spatial constraints holding values 
different from zero). Nevertheless, using the equation (8) 
as a constraint leads us to a flat response over the 
interfering region as desired. 


X1B-* MAGNITUDE RESPONSE FIRST DERIVATIVE FUNCTION 



0 (degrees) 

Figure 2. Zoom of the power response first derivative lunction for a linear 
array with an interferer in 30°. 


The reason is rather simple if we look at the figure 2. 
The figure represents two first derivative functions of the 
magnitude response for a linear array, one corresponds to 
the use of ZF derivative constraints and the other to the 
use of only spatial constraints. It is an easy task to realize 
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that the derivative constraints behave as a 'second zero in 
the first derivative function, and this is why the magnitude 
response gets flat. 


Ill.l.b 7FS derivative constraints. 

Deriving now once again with respect to d we obtain 
the equation 


^ = 2fl)^(w"AflRxxAtfw)- 
-©^(w"A^^R„w + w"RxxA>)- (10) 


from which, making equal to zero, it is possible to get the 
following equations 


C"(5)A^w|^ =0 
C^(0)A>|^^ =0 




= 0 


(lla) 

(llb) 

(llc) 


Notice that equation (8) derived for the ZF case shows up 
again in (11a), and also that equation (11c) is linearly 
dependent on (8) due to the linear array shape. 
Consequently we consider equation (lib) as the linear 
derivative constraint for the ZFS case. 


Iir.2 Circular array case. 

Now we assume the sensors are equally spaced along a 
circular ring of radius d on the xy plane. This time the 
data is depending on both the spatial angles 6 and hence 
the magnitude response must be derived with respect to 0 
as well as with respect to 


•where the couple mi) defines the interfering direction. 

ni.2.b TPS derivative constraints. 

In this case the development leads to the following set of 
equations 



^ = 6i'fw"A,RxxA^w+w"A^RxxA,>v)- 

aiide ( 

(w" A^ A,R„W + w" Rxx A« A^w) - d ») 

Hf^R„w-w"Rxx-^wl 


-JCO\VI 


From the equations (15)-(18) we can 
following linearly independent sufficient 
constraints 


derive the 
derivative 


III.2.a ZF derivative constraints, 

Proceeding in the same way as for the linear case we 
obtain 




(19a) 

(19b) 


- - yfl)^w^AgR„w - w^RxxAflwj 
— = -y Jw"A^RxxW - w"RxxA^w) 


( 12 ) 

(13) 


so that the sufficient constraints are 


(14a) 

(14b) 


IV. COMPUTER STUDIES 

Software simulations have been carried out‘’■'de'’ 1° 
ihow the impacts the derivative constraints yield on the 
.eamformer output pattern. The examples il ustrated m 
his section shall consider a very simple scenario 
insisting of one desired look direction and one Jnterferer 
issuming spatial null constraints applied m the i^rfe 
IS well as in the look direction and the derivative null 
lonstraints applied only in the interferer. 
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We begin considering a linear array of 8 antenna 
elements equally spaced by a half wavelength and a 
scenario consisting of a look direction at broadside and an 
interferer arriving at Fi^re 3 shows the 

beamformer response where it is possible to notice how the 
response broadens in the interfering location as we insert 
derivative constraints. A zoom of the region close to the 
interference is depicted in figure 4, where the flat 
behaviour can be appreciated. Notice that any mistuning 
effect in the interferer is very harmful indeed to the 
magnitude response if we do not use derivative constraints, 
leading clearly to an important attenuation loss. This 
matter is better illustrated in figure 5, which shows the 
effect suffered by the attenuation performance for the three 
different cases treated here when the interfering frequency 

has changed. . , 

Next we consider the case of the circular array. We 
assume the array sensors are located along a circular ring 
of radius d such that kd=L, with k=2nlX. 

Figure 6 depicts the beamformer response for the 
circular array case, which is symmetric due to the circular 
nature. Similar deductions derived for the linear case can 
be also applied for the circular case. The flat behaviour is 
also shown in figure 7 which is extracted from figure 6 by 
zooming in. A figure like figure 5 can be also given with a 
very similar shape but it is not provided m this paper. 



Figure 3. Beamformer response for a scenario with a 0 dB desired si^al 
coming from the broadside direction and a 7 dB interference at ^30 
logelher with a -20 dB uncorrelated white noise. 



0(degrees) 

Figure 4. Zoom of figure 2. 


ATTENUATION IN 0i DUE TON INTERFERING FREQUENCY VARIATIONS 
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Figure 5. Attenuation in 6i when changes afiect the interfering frequency. 
0% means the interfering frequency has not been affected. 



0 (degrees) 

Figure 6. Beamformer pattern, fixing ^^90®, for a circular array of 8 
isotropic elements with a look direction steered at (^,^^=(90®,90®), an 
interference at ((9,s^=(30“»90®) and a -20 dB uncorrelated white noise. 



We have seen thus far how the use of derivative 
constraints affect the shape of the beamformer output 
response. We might suggest now to look for an equivalent 
beamformer response without using any derivative 
constraint but only spatial constraints, where the term 
"equivalent” stands here for "using the same number of 
constraints in matrix C”. It is clearly an alternative 
solution to the use of derivative constraints that spreads 
out the magnitude response where the interferer is located. 
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as depicted in figure 8 which represents the same example 
set forth for the linear array in figure 3 but considering the 
ZF case (in solid line) compared to a response (dashed 
line) with two spatial constraints, and consequently 
obtaining two equivalent responses. In the first instance, 
note tliat it is possible to distinguish two different 
behaviours, i. e., the derivative constraints yield a 
maximally flat behaviour whereas only spatial constraints 
produce a ripple behaviour. 


COMPAWNO EQUIVALENT RESPONSES 



Figure 8. Linear array with an interferer in ^—30®. ZF case vs. response with 
spatial constraints in 0j=27“ and fe=33“. 


If we look at the figure 8 we see that the ripple 
behaviour can make the response unsatisfactory if, for 
instance, we want to have a response below -60 dB in the 
interferer although in the edges we get a better 
performance. This matter can be overcome changing the 
spatial constraints so that the ripple gets lower as shown in 
figure 9 but, nevertheless, the derivative constraints give 
now a better response. 


COMPAEINO EQUIVALENT RESPONSES 



Figure 9. Linear array with an interferer in ^/=30°. ZF case vs. response with 
spatial constraints in ft=29“ and ft=31°. 


V. CONCLUSIONS 

Although derivative constraints have been widely 
studied in the literature, the fact of applying them to the 
interferers has not appeared yet so far. The reason is that 
equation (7) is zero in the interferers without any 
condition but, nevertheless, the constraints derived show 
the important role they play against mistuning conditions 
due, for example, to band extreme interferers, Doppler 
effects or estimated interfering directions. 

Specifying derivative constraints to the interfering 
directions improves the magnitude performance of a 
beamformer and it can be seen as an alternative of adding 
null spatial constraints in the nearest region of the 
interference. 

Besides, it provides a way of controlling the response by 
mixing the constraints specified in the main beam as well 
as in the interferers. 

The examples shown in section IV arc particular 
because the beamformer response changes if we have a 
different signal scenario or even designing the array with 
another number of antenna elements but in general they 
are good enough to give an overall view of what can be 
expected if we use derivative constraints. 

V. REFERENCES 

[1] C.-Y. Tseng, ‘Minimum variance beamforming with 
phase-independent derivative constraints”, IEEE Trans. 
Antennas & Propagation, vol. 40, no. 3, pp. 285-294, Mar. 
1992. 

[2] M. H. Er and A. Cantoni, ‘Derivative constraints for 
broad-band element space antenna array processor” IEEE 
Trans. ASSP, vol. 31, no. 6, pp. 1378-1393, Dec. 1983. 

[3] M. H. Er, ‘‘Adaptive antenna array under directional 
and spatial derivative constraints”, Proc. lEE, vol. 135, pt. 
H, no. 6, pp. 414-419, Dec. 1988. 

[4] L. J. Griffiths and C. W. Jim, “An alternative 
approach to linearly constrained adaptive beamforming”, 
I RP-P- Trans. Antennas & Propagation, vol. 30, pp. 27-34, 
Jan. 1982. 

[5] K. M. Buckley and L. J. Griffiths, “An adaptive 
generalized sidelobe canceller with derivative constraints” 
TP.P.P. Trans. Antennas & Propagation, vol. 34, no. 3, pp. 
311-319, Mar. 1986. 


125 



































Adaptive Systems 
in Communications 



BLIND ADAPTIVE MUD WITH SILENCE LISTENING 


Enrico Del Re and Luca Simone Ronga 


Dipartimento di Ingegneria Elettronica 
Universita degli Studi di Firenze 
Via di Santa Marta 3 
50133 Florence - ITALY 


ABSTRACT 

In communication systems adopting CDMA as multi- 
acces protocol, the multiuser detection (MUD) is the 
optimal choice for both a better utilization of the com¬ 
munication medium and a reduction of the near-far ef¬ 
fects. Several papers [4] [5] [3] in the literature suggest 
some adaptive approaches to multiuser detection with 
various degree of blindness with respect to the knowl¬ 
edge of the characteristics of the channel and of the 
signals adopted. 

The weakest aspect found in those techniques is the 
need for the knowledge of a close approximation of the 
signal adopted by the user whose information is seeked. 
In this paper it is presented an adaptive multiuser de¬ 
tector which tries to estimate the user’s signal by lis¬ 
tening to the channel when the user is not transmitting. 

The resulting detector is expected to behave well 
even when the time varying channel changes continu- 
osly the shape of the user of interest. 

Part I 

Introduction 

Recent works [4] [5] [6] have shown that Multi-User 
Detection exerts a radical improvement on the perfor¬ 
mance of a CDMA receiver. An attractive model of 
a blind adaptive MUD receiver has recently been pre¬ 
sented by Madhow, Honig and Verdu [3], a model for 
which the knowledge of the signature waveform of the 
desired user, along with the timing of all the users, are 
the only requirements. 

Even if the signal waveform of the desired user is 
not exactly known, the detector performs fairly well if a 
limitation to the cancellation of interference is imposed. 

In this paper it. is presented a blind MUD with an 

This work was supported under the fynancial support of ASI 
and MURST 


additional adaptive branch which is designed to cor¬ 
rect the wrong estimate of the wanted user’s signature 
caused by a multipath channel. To reach this goal with¬ 
out training sequences from the trasmitter, some addi¬ 
tional information are supposed to be available to the 
receiver: the knowledge of the time periods when the 
desired user is silent, i.e. it is not trasmitting any signal 
in the channel. 

Part II 

“Listen to the silence!” 

The inspection of the common characteristics of the in¬ 
formation flowing in wireless systems reveals the sub¬ 
stantially dicontinuous nature of the information flow. 
In voice channels over 30% of time is not used to trans¬ 
mit any information. In data channels discontinuances 
are present as well, depending on the nature of the con¬ 
nected system. 

We shall now prove that in CDMA systems, the 
knowledge of a user’s silence period is useful to the cor¬ 
rect estimate of that user’s signature waveform. Unlike 
the training sequences, which may be used to correct 
that estimate as well, silence periods are always present 
during the trasmission and so their use may be easily 
integrated in the communication devices. 

The silence/not silence information is a very slowly 
signal compared to the spreading signatures and may 
be transmitted in a very narrow portion of the medium 
spectrum without any sensible loss of capacity. 

To get into details the following signal is received 
by a CDMA environment: 

K 

y = +n (1) 

fc=i 

where 
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Figure 1: The general structure of the MUD receiver. 
K is the number of users, 

Ak is the received amplitude of the k-th user’s signal, 

bk is the antipodal binary information from the k-th 
user, 

Sk is the received signal from the k-th user, here rep¬ 
resented as a member of a Hilbert space Ti* 

n is the projection onto of a white gaussian stochas¬ 
tic process with zero mean and variance equal to 

<7. 

We are interested in the first user’s information. 
Due to multipath, at the receiver is available only an 
estimate of the first user’s signal: say si. 

The linear MUD receiver is composed by two com¬ 
ponents, the estimate of the desired user’s signal si and 
an orthogonal component xi dedicated to the suppres¬ 
sion of the interferers. 

For a binary, antipodal equiprobable signaling the 
estimated information is given by a threshold detector 
which follows the decorrelator as shown in fig. 1. 

If a reliable estimate of the first user’s signal is avail¬ 
able, i.e. si = si, then xi may be adaptively modified 
in order to reduce the so called Mean Output Energy 
defined as 

MOE[xi] = E[< y, h + xi >^] (2) 

where £?[•] is the expectation value and the operator 
< a, > is the scalar product defined over the Hilbert 
space chosen to represent the signal considered in the 
trasmission. 

When the transmitted waveform from user 1 is not 
exactly known, the adaptation rule of xi needs to be 
modified in order to prevent the cancellation of the 
desired signal. The blind adaptation rule as described 
ill [3] is modified with an additional costraint on the so 
called surplus energy^ defined as 

X = ||ci||^“-1 ci = si-ha:i (3) 



Talk Silence 


Figure 2: The two possible states of user 1. 

By imposing a value of x ^^e blind adaptation 
rule, is achieved a lower degradation of the signal to 
interference ratio (SIR) even in the presence of an es¬ 
timation error of the desired user’s signal. 

In order to achieve a good level of SIR in the high 
SNR region, the silence period listening is introduced 
to move the desired signal estimate towards a signal 
less sensible to the interference. 

Two possible states are considered with respect to 
the first user as shown in fig 2. 

TALK corresponding to a received signal as 

K 

y = 5!^ -i-n 

SILENCE with a received signal as 

K 

ya = Y^Akb'kSk+n 
A:=2 

We look at the quantity called Mean Silence Output 
Energy defined as 

MSiE[si] = £^[(< Vs, Si > + < 2/, xi >)^] (4) 

As shown in appendix A in the low noise region 
{a 0) the mean silence output energy is equal to 

K 

MSiE[h] = YlAl<su3k>^ (5) 

The local minimization of such a quantity, along 
with the constraint of a unitary norm of si (||si|| = 
1), yelds a correction on the estimate of si towards a 
signal less sensible to the interferent components of the 
received signal. 

In the practical realization of the proposed receiver, 
the above mentioned correction in the estimate of si is 
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Zmf^ 



Zjqfs^, 


Figure 3: The proposed receiver. 

performed when the user of interest is known to be in 
the silent state. Instead, during the talk state^ the adap¬ 
tivity mechanism of Si is “frozen” while it is running 
the adaptivity branch of the canceller xi. 

It should be also noticed that the minimization of 
MSiE does not lead to a signal closer to si;instead it 
tends to minimize the components of si which lie along 
one or more interfering signals. The resulting signal, 
chosen among those unitary normed signals ortogonal 
to Xi, will have intuitively a relatively stronger compo¬ 
nent along Si (which bears the seeked information) and 
also its component along the interferers will be reduced. 

It is important to point out that the minimization of 
MSiE is a local one, with its initial value of si being the 
uncorrupted signal si, assigned to the user of interest. 

If for any reason tha adaptivity algorithm falls in 
the attractor region of another signal equally orthog¬ 
onal to the interference signal but without any com¬ 
ponent along si, the receiver will no longer be able to 
recover the information. To prevent this problem is 
under study a technique to limit the amount of correc¬ 
tions performed during silence periods only when an 
attractor boundary is reached. 

Part III 

The receiver ... 

The structure of the proposed receiver is shown in fig 
3. 

The stochastic gradient descent algorithm for adap¬ 
tivity of xi[i] is fully described in [3] so it is here re¬ 
ported only the final formulation. 

For the adaptation rule of Si it is recalled the ex¬ 


pression of MSiE[h] 

MSiE[si] = S[(< y,,si > + < j/.n >)*] = 

= E[< y„ Si >^] + E[< y, xi >^] + 

+ 2J5[< 2/„Si >< y,xi >] (6) 

Its uncostrained stochastic gradient (si being the 
variable vector) is 

VM5zjE[si] = 2 < 2/s ,> Vs + 2 <y^xi > = 

= 2(< 2/«,Si > + < y,xi >)ys (7) 
The component of (7) orthogonal to Xi is 

2(< ysyh > + < >) (j/s” < yay^i > xi) (8) 

so the adaptation rule for si is 
5i[z] = si[i - 1] - 


(<y«[*].«i[*- 

1] > + <y[i].®i[*- 

1] » 


ivSh < 


1] > Xi [i 

-1]) 

(9) 

now onwards 

we will call 




< J/[*].Sl[* 

-i]> = 

ZMF[i] 


(10) 

< y[i],Xi[i ■ 

-i]> = 

ZoF[i] 


(11) 


-i]> = 

ZoFs[i] 


(12) 


It should be noticed that the quantity < y^xi > 
it is not available at the same time of < > . 

For that reason, during the “talk” period, the quantity 
<y,xi > is continuously stored in a memory cell called 
ZoFs[i\' When a state transition occurs at t*, the last 
value of ZoFs[i*] is taken as an estimate of ZoFs[i] for 
i > i*. 

The overall adaptation rules for each state of the 
receiver are: 

TALK 

a:i[i] = iEi[i - 1](1 

- HxZ[t\{y[i\-ZMF[i]si[i-'i\) (13) 

Si[i] = si[i - 1] (holding...) (14) 
ZoFs[i\ = ZoF[i] (storing...) (15) 
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SILENCE 

si[i] = si[i - 1] 

- A's + ZoFs[i]) 

- ZoF[i\xi[i - 1]) (16) 


Number of “real” interfering users (K) 

7 

Ak A: = 2,, AT 

/io 

Talk/silence periods ratio 

1: 1 

Multipath index (a) 

1.0 


Table 1: Simulation parameters. 


a:i[t]=m[i-l] (holding...) (17) 
ZoFs[i\ = ZoFs[i - 1] (holding ...) (18) 

where 

pLxjfiy are the adaptation steps of the two adaptiv¬ 
ity branches. Under stationary conditions of the 
stochastic processes involved in the reception, the 
algorithm is conducted to the solution [2] if = 

i. In order to follow the channel variations a 
lower bound is introduced. 

u is the Lagrange multiplier responsible for the upper 
bound to the surplus energy as described in [3]. 


Part IV 

... its testbed ... 

The performance of the receiver and the gain obtained 
by silence listening is evaluated by computer simula¬ 
tions. The simulated CDMA trasmission system is 
characterized by: 

1. DS-CDMA 31-chips Gold Sequence [1] for the 
wanted user, unitary energy of the desired user’s 
spreading sequence. If Si t £ [OjTt] is the 
spreading sequence assigned to the first user, the 
received signal corrupted by multipath is : 

/.X ___ Si + asi{t — Tb/2) 

\\si + asi{t-n/2)\\ 

with a being the multipath index. The received 
signal from user 1 is thus 

hi{i)si{t “ iTf,) t G [iTb, {i + l)^!,] 

The self interference effect is modeled as an addi¬ 
tional interfering user which uses a shifted version 
of the signal assigned to it : 


2. K interferent users, each with a different 31-chip 
Gold sequence and an amplitude Ak {k = 
2,,AT). Each received signal waveform from the 
interfering users is supposed to have unitary en¬ 
ergy. The received signal from the k-th interferer 
is thus: 


Akbk{i)sk{t - iTt) 

t e [iTb, {i + 1)T6], 

k = 2,,K (20) 

3. Additive white gaussian zero-mean noise process 
with variance cr^. 

The performances are computed in terms of Signal 
to Interference Ratio defined as the ratio ( in dB ) 
between the power of the information-bearing signal 
and the power of the interfering signal which passes 
through the linear receiver as shown in the following 
formula 


SIR = 


_ < 3l,Cl _ 


( 21 ) 


In the simulations neither time-variant multipath 
nor loss of synchronization between the transmitters 
and the receiver have been considered. At the time of 
this writing, the ability of the receiver to follow deep 
fades, it is tested and the results will be presented as 
soon as possible. 


Part V 

... and the simulation 
results. 


AK+ihK+iii)sK+i{t - iTb) — 

ahK+i + Tb/2 - iTb) 

te\iTb,{i + l)Tb] (19) 

where a is again the multipath index. 


In fig 4 are shown the values of the SIR of the proposed 
receiver versus the SNR of the first user’s signal. 

The transitions from talk to silence is performed 
every 10 symbols of the first user. The simulation pa¬ 
rameters are shown in table 1. 
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4 


Blind Adaptive MUD - SIR 



Figure 4: SIR vs SNR for user 1. 


The two curves labeled “Silence w Multipath” and 
“No Silence w Multipath” represent, respectively, the 
performance of the receiver with and without the si¬ 
lence adaptation algorithm. 

The two curves labeled “Silence w/o Multipath” 
and “No Silence w/o Multipath” represent the perfor¬ 
mance in case of a reception not corrupted by any mul¬ 
tipath effects, that is with a = 0. 


Part VI 

Appendix 

Here is derived here the expression for the Mean Output 
Silence Energy, Let’s start from the definition of MSiE 
as defined in (4) : 


^[(< 2/».si > + <y,xi >)^] = 

= E[< y„si >^ + < y,xi 

+ 2 < y,,si >< y,xi >] = 

. V 

< 5 fc,Si > + 


K 


= E[{Y^A,b[ 


\k=2 




< Sk,Xl ] + 


K K 

+ 2'^'^AkAhb'kbh < ak,si >< Sh,Xi. >] = 
k-2 h=l 


considering only the terms that will survive after the 
^[•] operator, 

K 

= E[Y^Al<Sk,h >" + •'•• 

k=2 

K 

+ < sjfe,xi H-] = 

*=i 

K 

= Al< 8i,Xi (< Sk,Si >^ + < 3k,Xi >^) 

k=2 


and considering the parts which Vary with si , 

K 

= J2^k<Sk,8i>^. ( 22 ) 

fc=2 
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ABSTRACT 

This paper presents a linear cancellation detector for 
the Gaussian channel, operating over limited time in¬ 
tervals of the received signal, in a similar way to a 
multi-input multi-output FIR filter. The parameters 
defining the detector become time invariant, and the 
conditions to be met by the signatures are stated. A 
bound for the multiuser interference due to the limited 
time correlation, and upper and lower bounds for the 
error probability are obtained. The theoretical bounds 
and numerical results show that the detector is ade¬ 
quate for systems intended for many users whose am¬ 
plitudes can be restricted to a given range. 

1. INTRODUCTION 

In a seminal paper, Verdu [1] demonstrated that the op¬ 
timal multiuser coherent detector can be implemented 
using the Viterbi algorithm, which is exponential re¬ 
lated to the number of users. For this reason the oi> 
timal detector is not suitable for real systems and less 
complex suboptimal alternatives [2] have been studied. 
The decorrelating detector, being part of the subopti¬ 
mal linear family, is not exponential, is near-far resis¬ 
tant and can be implemented as a K-input, K-output, 
linear time invariant (K is the number of users) fil¬ 
ter [3]. Several strategies have been adopted to niaxi- 
mize the efficiency in the implementation of this linear 
detector, such as the sliding window [4], or the isolation 
bit insertion [5], where the tridiagonal structure of cor¬ 
relation matrix is used. If the system design allows for 
a slight level of interference, then it is possible to realize 
very efficient detectors with limited linear cancellation, 
as it will be shown in the paper. 

This work was partially supported by CICYT, under grant 
NO TIC95-0320 


2. SYSTEM MODEL 

In a CDMA system several users share the medium at 
the same time. Each user employs a different code -i. 
e. a different waveform or signature- to carry their own 
information. Following the notation of Verdu 11], the 
multiuser signal is given by 


N-i K-i 
i=0 fc=0 


Tfc) (1) 


where bkii) is the bit transmitted by the user k 
in the ith period. The bit is taken from the binary 
alphabet: bkii) € {-1,1} • Moreover Sfc(t) is the 
signature associated to the user k, T is the bit period, 
k is the delay associated to the user k and is supposed 
to be lower than the bit period, K is the number of 
users and N is the number of bits of each of 

By designing as Wfc,t(t) the signal Skit-iT — Tk) 
scaled to unit norm, the multiuser signal must be in the 
subspace generated by the basis {uk,i (01 > assuming 
linearly independence for these signals. The coefficient 
corresponding to the signal itk.i (l) indicates the bit o 
the user k in the interval i, weighted by the attenuation 
suffered along the path. The same information can be 
obtained by the receiver if it uses the reciprocal basis 
U’k lit)} ) wfiose generic element is defined by 


f 1 for g = ^ and I ^ i / 2 \ 
{'^9A (0 j (^)) ■” 0 otherwise 

The elements of the reciprocal basis can be expres^d 
as linear sums of those of the original basis {uk,i (t)}, 
and to this end the following matrices and vectors are 

defined 
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R 


a 


k,i 


Ro 

R_1 

0 


0 

Ri 

Ro 

R-i 


0 


Ri 

Ro 

R_i 

... 

0 

. . . 

Ri 

Ro 

R-1 

0 


0 

Ri 

Ro 


k,i 

^0 


0 



0 

k.i 

e* = 

efc 

k.i 


0 

®iV-2 

k,t 


0 



L -1 


(3) 

(pos. i"*) 


Where af’’ is the coefficients vector of the reciprocal 
basis element Vk,i {t) corresponding to the bit 1, and 
Ro, Ri and R_i are the correlation matrices for lags 
0, 1, and -1, respectively. Ftom 2 and the above defi¬ 
nitions,the following matrbc equation is obtained 

Ra‘’’’ = e* (4) 

lb obtain the solution to 4 the following matrices 

are defined recursively: 

Ai = Ro - R-i ( 5 ) 

Bi = Ro-RfBi_\Ri 

Making substitutions from the first equation in (8) 
to the i*'", and from the last to the i*'* an easier to solve 
system is obtained, namely 

a^\ = -Ar^R-ia,*’* 

Ri Bii-i + B-o a,-’* + R-i = efc (6) 

aj^+i = “Bw-i-i Bi *1 ’ 

2 1, CONVERGENCE OF THE RECURSIVE 

MATRIX FUNCTIONS 

Assuming that the recursion 5 have limits A = 

and B = lim B„ , respectively, then if the number of 

bits tends to infinity, the coefficient vector a*"’* does 
not depend on the interval i. Therefore the reciprocal 
element Vk,i {t) is invariant against index i shifts and 
thus, it is not necessary to calculate it for each interval. 

The signatures must have correlation matrices Ro 
and Ri which make the recursion 5 to converge in order 
to guarantee that Vk,i (i) is invariant, and arranging in 
a vector the elements of really intervening in the 
iteration, the referred iteration results: = fA (vX) 

, where f(0 is a nonlinear matrix function. 


The convergence is determined by the behavior of 
the function near the stationary point [6] , [7] ve = 
fA (v£’) . In a neighborhood of the stationary point 
the function Fa (’)can be substituted by their linear 
approximation if it has continuous first derivatives, in 
this case: 

fA (X) = fA (Vb) + DfA (Vb) (x - ve) 

where DfA (ve) is the Jacobian of de vector func¬ 
tion fA (•) evaluated at the stationary point ve, and 
X is any point in the neighborhood. It can be consid¬ 
ered that X is obtained by summing a perturbation ^ 
to VE- The perturbation in the inten'al i-t-1 is given by 
$.+, = DfA(vB)$i, then = (DfA (ve))"^o. and 
the succession converges to 0 if and only if 

lim.n-*oo ll^nll 0, where the norms referred, here and 
in the rest, are supposed 2-nornis. The limit tends to 
0 if and only if [6] the spectral radius of the Jacobian 
of fA (•): PDfA(vE) ) evaluated in the stationary point, 
is lower than one: 


/^DfA(vK) ^ ^ 

At the same time, the spectral radius is a direct in¬ 
dication of the convergence rate and allows to compare 
different set of signatures. 

3o REALIZATION OF THE DETECTOR 

The process of detection of the ith bit of th kth user 
is made correlating the received signal r(t) with the 
element Vk,i {t) , which is considered invariant: 

bk {i) = sgn ((r (t), Vfc,i (t))) (8) 

From 4 is clear that the number of vector coeffi¬ 
cients defining Vk,i {t) is equal to the number of bits, 
and this can be large. It is desirable to deal with a 
finite number of coefficients in each bit interval. The 
solution of the system 4, 6, in case of convergence of 
the matrix recursion, is: 


k,i 


k,i 




(9) 


and it shows that the coefficient vectors that are 
L+1 intervals apart from vector i vanish progressively if 
the spectral radius of the matrices A Ri and Ri 
is lower than 1, the norm of the coefficients 
and tends progressively to 0 when L is large 

enough [6], by which, if a little error is allow in forming 

Vk,i{t) , then only a finite set of coefficients, |af’*|. 


133 



CORRELATOR 


r(t) 




- 

«oW 


L^io 


“K-lC) 


SAMPLER 



to filter 


b 



Figure 1* Structure of the detector 

I _ I ^ ij ... i Lis needed to define the reciprocal 

basis for any interval i. . . • i 

In the receiver the correlation of the received signal 
r(t) with the K signals Vk,i (t) is done, where the last 
ones are obtained from {itfe,* {t)} by means of a finite 
and invariant set of coefficients. The resulting scheme, 
shown in 1 is made up by a correlator followed by a 
invariant multi-input, multi-output filter, very similar 
to a FIR filter, in which the taps are the coefficients 
, with delays of one bit period. 

4, MULTIUSER INTERFERENCE BOUND 

It is supposed that the iteration is near enough to its 
limit for the error by this being negligible. The bits 
transmitted by the different users in the j interval can 
be arranged in a vector:bj, and the square root of the 
energies associated to them put in in the diagonal of a 
matrix: W. The row of transmitted bits, weighted by 
their corresponding attenuation, is given by 


S/k{i) = sga{{r{t),v/k,i{t)))= ( 12 ) 

where is the contribution of the multiuser sig¬ 
nal to the detected bit and iV/*''* groups the effect of 
the noise. 

Using 11 and 12 it is possible to express explicitly 
the effect of an imperfect interference cancellation due 
to not having into account the N intervals in forming 
the reciprocal basis. Thus 

= (bW) -f- (bW) R er^>* = 

- y/mbk{i) + h{i) 

so that, the contribution of the multiuser signal to 
the detected bit can be decomposed in two parts: one 
due to a perfect interference cancellation that in ab¬ 
sence of noise would lead to a perfect detection, and 
other, Ik (^)> that represents the multiuser interference 
to the user k in the interval i due to the limited can¬ 
cellation. 

In the computation of the coefficients, the num¬ 
ber of iterations in the recursions 5 is considered large 
enough for the errors to be negligible because the price 
paid for it only increases the number of iterations done, 
but not the circuitry. The error vector is then 


er 


,fe,i — 


^0 

0 


• • - ^ 

• ^i+L+l ^N-li 


(14) 

Prom this, it is clear that the multiuser interference 
can be decomposed in two terms, one of them affected 
by coefficients of the past, backward interference, 4 ^ 
and the other affected by those of the future, forward 

interference, : /fc(f) =4.i + '^fe,t‘ 

Letting the number of bits N tend to infinity, the 
terms of backward and forward interference, and 
It- respectively, can be expressed as 


(bw) = Kw b[w--b?;_iw] (10) 

To obtain a bound for the multiuser interference the 
vector a/'*-* is defined. It is made up by the 2L-|-1 coef¬ 
ficient vectors, centered around the vector of subscript 
i, enlarged with 0 vectors to complete N components. 
This vector, can be expressed as the sum of the 

exact and complete coefficients vector a**’* , plus an 
error vector 

+ er*’’* (11) 

The bit estimated by the detector can be expressed 
as 


I-. = E (bT ._iWRf+ bf_^WRo+ 

+bf_,.+iWRia*l*,.) 

4+. = £^^(b?;,._iWRf-hbf+,.WRo+ 

+b4,.+iWRiat;V 

(15) 

Prom this, a bound of the multiuser interference 
can be obtained easily with expression 9 substituting 
for the coefficients in 15. In the numerical calculations 
the matrices A'^Rf and B-*Ri where simple, and 
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this is the more likely case. Using this fact, a bound 
for the multiuser interference is obtained from 15 


(|4:,| + |/i|)< 


Imsr = max 

< |ldlag(W)||(||Rfll + ||Ril| + |lRol|)x 
X (Ro - Ri A-^ Rf - Rf B-i Ri) 

X «(Va) pi+'+^(VB) 

(16) 

where -n (Va) and pa are the condition number and the 
spectral radius of the matrix Rj^. v{Yb) and />b 
have similar meaning for the matrix Ri. Prom this 
equation, the following bounds for the error probability 
for the user k, Pk are obtained in a straightforward way, 
assuming equally likely bits. 
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Figure 2: Maximum multiuser interference for 16 Gaus¬ 
sian signatures 


oo 

where (5 ( 3 :) = J ^l/v^^ dt 

X 

with <T“ the spectral density of the noise, 

5o NUMERICAL RESULTS 

Several numerical experiments have been carried out in 
order to validate the theoretical results and to increase 
the knowledge about the behavior of the detector. The 
experiments where carried out with 8, 16, 32 and 64 
users using different sets of signatures, binary and non 
binary. In Fig. 2 the results corresponding to 16 non 
binary signatures are shown. Prom the results is clear 
that the simulated maximum multiuser interference fol¬ 
lows the same exponential trend that its theoretical 
bound, but the theoretical bound is always larger. 

It is also clear, from the experiments and the theo¬ 
retical bound, that only a small number of coefficients 
is needed to maintain the interference below a value 
fixed in advance. 

6. CONCLUSIONS 

A method for the limited linear cancellation of mul¬ 
tiuser interference has been presented and also a bound 
for the multiuser interference has been given The the¬ 
oretical analysis and the numerical results shown that 
the method of limited linear cancellation of the mul¬ 
tiuser interference, here presented, is a good alterna¬ 
tive for the implementation of DS/CDMA systems over 
Gaussian channels. 
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ABSTRACT 

In this paper we study the blind separation problem of 
non-Gaussian independent sources. We show how to 
generalize the separation problem in convolutive mix¬ 
tures for any number of sources, and we present a new 
algorithm which solves it. The algorithm is a mini¬ 
mization of the functional quadratic sum of a special 
set of cross-cumulants between output signals, and can 
be interpreted as a quasi-Newton search for the zeros of 
the gradient of this functional. Local asymptotic sta¬ 
bility of the algorithm is proved and its relation with 
alternative cancelation criterion is showed. 

We also study another approach to the blind separa¬ 
tion problem of two sources for instantaneous mixtures 
based on robust cumulant estimation criterion. 

1. INTRODUCTION 

Blind separation of sources can be defined as the prob¬ 
lem of identifying and estimating multiple source sig¬ 
nals from an array of sensors without knowing the char¬ 
acteristic of the transmission channels. The typical 
assumptions allowed to resort to are the linearity of 
the transmission channels and the independence of the 
sources. Two basic architectures are possible in the 
separation process, the feed-forward and the feed-back¬ 
ward, shown in Figures 1 and 2, respectively. 

In the recent literature of the separation problem, 
cumulants based methods have played a central role 
when there are two mixed sources, but less attention 
was given to a generalized to any number of sources 
mixture problem. 

We will first analyze the separation problem of two 
sources in instantaneous mixtures, while in section 3, 
we show how to deal with multiple sources in convolu¬ 
tive and instantaneous mixtures. In sections 4 and 5 we 
will propose and derive the minimization and cancela¬ 
tion algorithms. And finally, section 6 shows a separa¬ 
tion example that corroborates the theoretical results. 


2. SEPARATION OF TWO SOURCES 

Consider a simplified signal mixing model where by 
means of two sensors, one observes two instantaneous 
linear mixtures, yi{n) and 2 / 2 (^)) zero-mean 

sources ini(n) and The assumptions are that 

the sources are non-Gaussian and statistically indepen¬ 
dent. If the channels are noiseless, the sensors outputs 
are given by 



where 012 and 021 are the instantaneous mixing coef¬ 
ficients. If we assume that the channel is lossy, and 
without echoes, the above mixing coefficients satisfy 
\aij \ < 1 , for ij = 1 , 2 , i ^ j. 

The objective is to obtain the source separation by 
estimating a 2x2 matrix B (see Fig. 3) such that 

^i(^) 1 ^ ^>12 1 r 2/1 (^) 

S 2 (n) J ^ [ ^21 1 J L 2 / 2 (n) ^ 

1 + 021612 012 + 612 1 r a^i(o) / 2 \ 

021 + 621 1 + 012621 J [ X2(n) ^ 

The separation is achieved if the coefficients 612 and 
621 are such that (i) 6,j = —a,j, or (ii) 6,‘j = -l/oj,-, 
iJ = 1,2, i ^ i. In this paper, only the first solution 
is treated. For this case, the separated signals, si(n) 
and 52 ( 0 ), are related to the sources by 

Si(n) = (1 - 612621 ) 0 :,(n), i=l,2 (3) 

Note that the estimation of all fourth-order cumulants 
of sx{n) and 52 ( 7 ^) needs the previous estimation of 
the matrix B in a recursive manner, which may be 
very time-consuming. In addition it implies, in fact, 
an adaptive environment. We propose an alternative 
where the cross-cumulants of si(n) and 52 ( 0 ) are given 
in term of the cross-cumulant estimation of the sensor 
outputs yi(n) and y 2 (n). 
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2.1. FOURTH-ORDER CROSS-CUMULANT 
ESTIMATION 

If the sources xi{n) and X 2 {n) are assumed to be a 
zero-mean stationary fourth-order non-Gaussian white 
noise and statistically independent, it can be shown 
that cross-cumulants of the outputs si(n) and S 2 (n) 


can be expressed by [1] 

713 ( 51 ,^ 2 ) = ^n/l2lTri d- ftl2/l227ar3 (4) 

T3l(5l)S2) = Ali/l2lTxi -f/ i?2/^227x3 (5) 

722 ( 51 ,^ 2 ) = /ill^2l7ri "f ^12^227ir3 (^) 

where, from relation (2) 

ha = 1 + (^) 

h{j = Qij + tjj, hi ^ 1}2 ( 8 ) 


and 7 a;. represents the kurtosis of i = 1,2. We 

note that all expressions in (4)-(6) are given in terms 
of 7a;, , i.e. the kurtosis of unknown sources. It may be 
more interesting to put out (4)-(6) in terms of the sen¬ 
sor output cross-cumulants since these quantities are 
directly measurable. For this purpose, first we obtain 
expressions for the kurtosis and cross-cumulants of the 


sensor outputs, which are given by 

7yi = 7*, + “127*3 (9) 

7y3 = “2 i7*i + 7*3 (10) 

7i3(yi,y2) = “217*1 + “127*3 (11) 

73 l(yi.J/2) = “217*1 + “i27*3 (12) 

722(yi,y2) = “li7*i + “127*3 (13) 


Next, substituting (7)-(8) in (4)-(6), and using (9)-(13), 
we have (after some algebra) 

731(81,82) = 

( 36 i 2 H- ^12^21)713(2/1,2/2) + (1 + 3612621)731(2/1,2/2) 
+3(612 + 612621)722(2/1,2/2) + 62 i 7 yi + ^i2Ty3 
713 ( 81 , 52 ) = 

(1 + 3621612)713(2/1,2/2) + (3621 + 621612)731(2/1,2/2) 
+3(621 + 621612)722(2/1,2/2) + ^2l7yi + ^ 127 y 3 
713(81,82) = 

2(612 + 612^21)713(2/1,2/2) + 2(621 + 621612)731(2/1,2/2) 
4 -(l + 4612621 + 612621)722(2/1,2/2) + 62i7yi + 6 i 27 y 3 

A cumulant cancelation algorithm [ 2 ] may now be used 
to solve the separation problem. Assuming that the 
signs of the signal kurtosis are the same, the problem is 
to solve simultaneously 731 (si , S 2 ) = 0 and 713(81 , 82 ) = 
0 , with spurious solutions removed by cancelling 
722 ( 81 , 82 ) = 0. For this purpose a Newton- Raph- 
son method or quasi-Newton (global) method are well 
suited to solve it. 



Figure 1: Feed-forward structure. 



3. SEPARATION OF MULTIPLE SOURCES 


Suppose we have P independent non-Gaussian sources 
a;i[n],..., a;p[n] which are mixed through the causal 
FIR filters matrix A{z) to give i/i[n],..., T/p[n], the 
mixed signals. Let the transfer function H{z) be the 
product of the separation and mixing filter matrices. 


H(z) = 



/ 1 • * • -4lp(*) \ 

Bai(*) I 

1 1 


\ • ^P-l,P / 




The filter matrices A(z) and B(z) are supposed to have 
unit diagonal elements, while C(z) zero diagonal ele¬ 
ments, in order to avoid output indetermination in the 
order of the sources. 

If we only want to separate the source signals up to a 
shaping factor, depending on the structure chosen, it is 
only necessary to seek for a diagonal transfer function 
H(z) = B{z) • Aiz) or H{z) = (/ + C(z))-^ ■ Aiz). 
When H{z) is diagonal, separation is achieved, and we 
will call this transfer function matrix Hd{z). 

If we want to recover the original sources and we use 
the feed-forward structure, we have to remove own sig¬ 
nal distortions introduced in the separation process, so 
it is need to post-filter V(z) signals through 
to obtain the estimated source S(<2:); in the case of 
the feed-backward structure this post-processing is not 
needed, since I H- C{z) can be set to the inverse of the 
mixing matrix A(z). 

Once on the solution the next relation hold: 


D(z) = Ih{z)-A-\z) (14) 


Equating diagonal terms we can obtain the elements of 
Hd{z) as: 


Det\B{z)\ __ Det\A{z)\ 
“ Adjii{B{z)) Adju{A{z)) 


(15) 
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This is an approximate expression that relates Ha^z) 
and B{z) when we are close to the separation solution. 
Substituting Hd{z) in equation (14) we obtain for each 
element of of B{z) the next expression 


Bijiz) 


Adjij{A\z)) 

Adjii{A{z)) 


(16) 


that can help us to determine the number of taps neces¬ 
sary for the filter in order to make possible the solution 
to the separation problem. 

When there are only two sources, the expression 
(16) simplifies to Bij{z) = -Aij(z), so it is only needed 
for B{z) to have the same number of coefficients of 
A{z). On the other hand, when the sources are more 
than two Bij{z) need to be, in general, a rational ex¬ 
pression, so it will need an infinity long series of impulse 
response coefficients. As long as we use FIR filters in 
B{z) we could not find the true solution to the problem, 
but we could get as close to it as our degrees of free¬ 
dom allow. An exception to this argument is the case 
of instantaneous mixtures, where the solution can be 
achieved with only one coefficient for each filter. The 
feed-backward case do not have this kind of problems 
because the solution is provided by C{z) = A{z) — J, 
and we only need for C(z) the same number of coeffi¬ 
cients of A(z) filters to get it. 


4. THE MINIMIZATION ALGORITHM 

The separation problem could be solved canceling all 
cross-cumulants of the signals, but this is not a feasible 
approach. So we will try to cancel only a set of cross- 
cumulants, on the hope that, in general, if the set is 
not too small, this will lead to a unique solution which 
coincides with the separation. In order to cancel the 
cross-cumulants we use a minimization approach based 
on a functional which is a weighed quadratic sum of 
some set of cross-cumulants chosen between all pair 
combinations of output signals under the form (vi[n], 
Vj[n - lc])\iytj. Then 

l/lpivi[n], Vj[n - fc]) (17) 

(a,p)eCL 1 = 1 i=] kzzO 

where Wap are the weights, 'yai3 are the cross-cumulants, 
and B is the vector of all filters coefficients {bij[k] \ k = 
0, • • •, L — 1; 2, j\i^j = !>**’> 

Minimization is done through a quasi-Newton algo¬ 
rithm that searches for the zeros of the gradient of <f>. 
The iteration for the algorithm can be set as 

5|n-fl = (18) 

HbW-A = Vb0, (19) 


Xi 
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Figure 3: Mixing and Separation filters, in the z do¬ 
main, for three independent non-Gaussian sources. 


where HbW is the Hessian matrix, Vb<^ is the gra¬ 
dient vector and /i is a diagonal matrix of adaptation 
steps. Hessian and gradient can be both approximated 
for their dominant terms around the separation solu¬ 
tion. 

4.1. CHOOSING THE CUMULANTS SET 

Minimization through the previous algorithm is based 
on the possibility of having a good estimate for the 
gradient and the Hessian of </> around the solution to 
the separation problem. This condition does not hold 
for any kind of cumulants; it can be shown that cross- 
cumulants with both a and jS greater than or equal to 
two lead to a null second order approximation for the 
gradient of so they can*t be used with this method. 
When a and /? are chosen to be equal to 1 we have 
the decorrelation criterion; this problem was studied 
in [ 3 ] but estimation for the gradient can be shown to 
be less robust than with other set of cumulants, and 
the asymptotic stability of the algorithm can be only 
guaranteed on simplified cases of study. Then, there 
only remains two possible class of sets: cumulants of 
the form 71/3 and cumulants of the form 7 ai- We will 
choose the first set ( 71 ^), since it will be shown that 
with it can be derived conditions that ensure asymp¬ 
totic stability for the algorithm, while the same does 
not hold for the other set. 

It is interesting to note that for instantaneous mix¬ 
tures both sets will lead to the same solution, but with 
our set, Hessian matrix will became diagonal and gradi¬ 
ents are well assigned to each coefficient variable, while 
for the other set, Hessian matrix takes the form of a 
permutation matrix which corrects the assignment be¬ 
tween gradients and variables. 

4.2. ASYMPTOTIC STABILITY 

We derive the algorithm assuming instantaneous mix¬ 
tures and non-Gaussian independent sources or, con- 
volutive mixtures and non-Gaussian independent (1 -}• 
/?maaf)"Order white sources. We also consider FIR causal 
filters for the sake of simplicity, but the algorithm can 
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be easily extended to non-causal FIR filters with a few 
additional considerations. Under this suppositions the 
output cross-cumulant between two signals u,[n] and 
Vj[n - k] has the following relation with the cumulants 
of the input signals 

yofiivi[n],Vj[n- k]) = 

P OO 

E E '‘"pt' + *=1 Ta+/j(^p[« -k-l]) (20) 

p = l /=0 

Substituting (20) in equation (17) and taking dominant 
terms of the gradient vector and Hessian matrix around 
the solution, it is obtained 

2J3(1 ^1/5 Ti+zJ(^i[”-*=])• 

Ef=o - fl) (21) 


in the Hessian matrix (this holds for normal mixtures 
since diagonals terms are always greater than the rest 

eLo - 0 > '■« i‘ - ' 141 ™ - !«■ 

we can approach Hessian as a diagonal matrix. The 
hjj[k-l] terms can be estimated through equation (15) 
from the vector of coefficients B, while input cumulants 
7i+/3(®j[« - ^]) 1*® replaced by the output cumu¬ 

lants Ti+;j(sj[n - k]). Then the adaptation algorithm 
simplifies to 

= bij[k] - ■ (2‘1) 

^ 0 

E “'l/J Tl+| 0 («i[n-u) E “ 0 Tl^(«i[n],«',["-']) 

(i,;?)en _ __ 

E 7?+/,(.d-u) E [* -'] 

(i,p)en 1=0 

for the convolutive case, and with instantaneous mix¬ 
tures the method can be simplified further: 




2E(i,/J)6n“'i'’ ■ 

'n+pixj[n - k]) 7i+i}{.xj[n - m]) • 
Emjn(i,m)^^.^-,J (22) 


while the remainder set of derivatives can be approached 
by null terms. Then, from relation (22), it can be seen 
that Hessian have a upper triangular structure, and 
asymptotic stability for the algorithm can be ensured 
with the following condition: 


0 < 7?+^(^i[n - k]) E h]f[k - /]< 1 

The fastest convergence rate is reached when the adap¬ 
tation step is 



but, since we could make errors in the estimates of the 
terms 7i+/5(®i [” “ ^1). ‘1 preferable 

to use fiijk = /To • with 0 < /io < 2, chosen in a 
way that always ensures the asymptotic stability of the 
algorithm at the worst case. 

4.3. SIMPLIFICATIONS 

When we have instantaneous mixtures, Hessian matrix 
is diagonal, so it results trivial to invert and the method 
is easy stated. With convolutional mixtures we can do 
a simplification in order to avoid solving the equation 
(19). Assuming dominance of the diagonal elements 


= bij - /io hjj[Q] ‘ 


5. THE CANCELATION ALGORITHM 

A cumulants cancelation algorithm was proposed in [ 2 ] 
to solve the separation problem. We extend this algo¬ 
rithm to the case of multiple sources as 

+ l — bij[k]\n — * 7l/?(^»[^]) ~ ^]) (^^) 

. 

From the same arguments used in subsection (4.1), we 
arrive to the conclusion that our best choice for the 
set of cumulants to cancel must be of the form 71 / 3 , 
since this lead to a well structured Hessian and then, 
to a correct assignment between variables and gradients 
terms. As long as our proposed set of cumulants will 
be used , Hessian matrix will be upper triangular, and 
asymptotic stability of the algorithm requires only the 
following condition 

0 < fiijk ■ h^j[0hp{xj [n -k])<2 (27) 

(.•5^i)=l,...,Pl *=o. L-1-, 

which ensures a contractive iteration toward the fixed 
point that constitutes the solution. This condition can 
be satisfied adjusting the adaptation steps values fiijk 
as 

ftf. = i\ in 

where Q < fio < 2, The most precise are our estimates, 
closer to 1 can be set fiQ. At /xq = 1 we reach the 
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Figure 4: Coefficients of the global transfer function vs. 
iterations. 


maximum speed of convergence. Approaching input 
sources cumulants 7 / 3 (a;j[n — k]) by the output ones 
Y^( 5 y[n — ik]), we arrive to the cancelation algorithm 




7lg(u,[n],Uj[rt-fc]) 

hjj[0]7pisj[n-k]) 


(29) 


It can be seen that the cancelation algorithm we set, is 
very similar to a minimization algorithm of a quadratic 
functional with one chosen cumulant, and they coincide 
ill the case of instantaneous mixtures. 


Figure 5: Minimization of (j) vs. iterations. 

Minimization of the cumulants functional </• is shown 
in Figure 5. The algorithm is able to lower the value of 
4> in more than five orders of magnitude. The descent 
stopped when accuracy on the cumulants estimation 
becomes insufficient, in our example it happens near 
the 30 iteration. 

We also have implemented a stochastic algorithm 
with lower computational complexity that also works 
fine with non-stationary signals such as voice. 

7. SUMMARY 


6. SEPARATION EXAMPLE 

In this section we present an example that corroborate 
the performance of the proposed minimization and can¬ 
celation algorithms. In simulations we use the set of 
cumulants 713 and 715 since are easy to derive. 

We implement for instantaneous mixtures and sta¬ 
tionary signals an algorithm that makes a robust es¬ 
timate of the cumulants through 500 outputs in each 
iteration. Five source signals of 600 samples each one 
are chosen to be i.i.d. white uniform noise. The stabi¬ 
lization step size is /iq == 0 . 8 , and weighing coefficients 


are wis = 

= 1, 

= 0.25. 

The mixing matrix is: 



1 

-0.42 ■ 

-0.57 

0.36 

-0.27 \ 



0.35 

1 

-0.87 

-0.18 

-0.30 

A = 


-0.51 

0.47 

1 

-0.16 

0.75 



-0.51 

0.35 

-0.64 

1 

0.03 



-0.82 

0.63 

-0.23 

0:82 

1 ) 


Separation is reached near the 30 iteration as can be 
seen in Figure 4 where the value of the coefficients of the 
global transfer function are shown through iterations. 
The final transfer function after 100 iterations is: 



1.04 

-0.06 

-0.05 

-0.05 

0.00 \ 

0.03 

1.01 

0.02 

0.01 

-0.01 


0.02 

-0.01 

0.95 

-0.02 

-0,01 


-0.05 

0.05 

-0.01 

0.96 

0.03 


-0.01 

0.00 

0.02 

0.00 

0.98 / 


In this paper we have studied the problem of blind sep¬ 
aration of independent non-Gaussian sources. We have 
presented a method based on a robust cross-cumulant 
estimation criterion for the case of instantaneous mix¬ 
tures of two sources. We have derived two algorithms 
for solving the generalized problem of multiple sources 
in convolutive mixtures: a minimization*algorithm for 
the quadratic sum of a determined set of cross-cumu- 
lants, and the cancelation algorithm. We found con¬ 
ditions for asymptotic stability of both algorithms and 
gave examples of convergence that corroborate their 
good performance. Future research will be oriented to 
the global stability study of the proposed algorithms. 
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ABSTRACT 

Ill this paper an adaptive correlator receiver 
for a DS-CDMA system is presented. It is well 
suited for a mobile radio propagation environ¬ 
ment which is characterised by a long delay 
spread of the chaimel. A decision feedback can 
be easily implemented in such receiver which can 
increase the performance of high-speed transmis¬ 
sion systems. 

1. INTRODUCTION 

hr the past few years some mobile radio sys¬ 
tems exploiting Code Division Multiple Access 
technique have been proposed in US (IS-95) and 
Europe (CODIT). They offer high capacity, soft 
handover capability, immunity to fadings in a 
channel and many other features important in a 
cellular environment. In case of a spread- 
spectrum communication system a well-known 
RAKE receiver can be applied which distin¬ 
guishes different propagation paths and practically 
realizes a time diversity reception [1]. If tlie re¬ 
ceiver is made adaptive it can follow the channel 
variations and combine the energy of the transmit¬ 
ted signal using a limited number of aims. In a 
DS-CDMA mobile radio system a dedicated pilot 
chaimel can be easily appbed in a down-link 
which can be used for estimation of the channel 
parameters: delays of the strongest paths as well 
as the phase shift and the amplitude of each of the 
paths. Tracking the phase shift and amplitude of 
the received signal is relatively simple and can be 


done in each arm independently using a very 
simple estimator [2]. However, estimation of the 
path delays must be done in a separate block 
which finds a given number of the strongest paths 
in a time window of a fixed length [3]. 

In this paper we investigate an adaptive corre¬ 
lator receiver, the structure which is equivalent to 
tlie RAKE receiver and subsequently we study its 
new modification resulting from introduction of a 
decision feedback. The structure of adaptive cor¬ 
relator is well smted for the mobile radio propa¬ 
gation environment where the delay spread can be 
as long as 30-h50/js and there are many propa¬ 
gation paths. 

The paper is structured as follows. First the 
mobile radio channel is briefly characterised. Next 
the stmcture of the adaptive correlator receiver is 
presented along with its decision feedback ver¬ 
sion. Finally, the performance of the receivers is 
veiified by a computer simulation and the results 
in terms of BER versus EjN^ are shown. 


2. THE MOBILE RADIO PROPAGATION 
ENVIRONMENT 

In a mobile radio propagation environment a 
transmitted signal is severely corrupted by a 
Doppler shift, shadowing effects and a multipath 
phenomenon. For each path the amplitude, phase 
rotation and delay of the received signal are time- 
varying. The multipath phenomenon results from 
reflections and diffiactions which are quite severe 
in urban and hilly enviioments. As an effect the 
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received signal is corrupted by an intersymbol 
interference limiting the effective bit rate of tlie 
transmission system which does not exploit deci¬ 
sion feedback or Viterbi detector in the receiver. 
Tire multipath phenomenon causes a frequency 
selective fading as well. The fades can be 20-30 
dB deep and result in the burst errors. They can 
be combatted by implementing FEC coding 
(block and/or convolutional) and interleaving. 
Due to the movement of tlie mobile station the 
received signal exliibits a Doppler spread. For die 
system working at 900 MHz and the mobile 
speed 250 km/li the Doppler sliift can be as big as 
200 Hz. 


3. THE ADAPTIVE CORRELATOR 
RECEIVER 

Due to tlie implementation constraints the 
adaptive RAKE receiver contains no more dian 
3-4 branches. As a result it tracks only a few 
strongest paths and part of the transmitted signal 
energy canied by the otlier padis is neglected. 
The adaptive correlator inherently exploits all the 
paths with delays falUng within the range specified 
by the time span of the channel estimator used in 
the receiver. The basic idea of the receiver is as 
follows (see Fig. 1). The estimator calculates the 



Fig. 1 The block diagram of the adaptive cor¬ 
relator receiver 


instantaneous channel impulse response estimate 
h. Its coefficients are low-pass filtered to remove 
the effect of noise. The resulting coefficients h 
are then applied for syndiesis of a reference signal 
in a FIR filter using die despreading sequence tq, 
generated locally in the receiver. The reference 
signal which is an estimate of the received pilot 
signal is coiTelated with the received data signal 
and the output of the correlator is sampled 
once eveiy trasmitted bit. The receiver operates 
according to die following formula: 


«... = decs Re 


1 


W.-l 

X r ’ +J ' c +J 

•''' c 


= decs Re 


j Af.-l L-1 ^ 

J '< c ./=0 


i =0 


where is a number of chips per bit (or proc¬ 
essing gain). Tracking of the channel is relatively 
simple since in die investigated system based on 
the Qualcomm proposal [4] die pilot signal is 
transmitted in parallel widi die user signals and 
channel estimation can be done using an adaptive 
or correlative channel estimator. 

The RAKE receiver (as well as the adaptive 
coiTelator receiver) has been derived under the 
assumption that the delay spread of the channel is 
much smaller than the bit duration and thus the 
intersymbol interference due to multipath may be 
neglected [1]. This is not fully true in a real mo¬ 
bile radio channels. For instance in the DS- 
CDMA mobile radio system proposed by Qual¬ 
comm [4] the transmitted bit at the basic chip rate 
1.228Mcps is 128-l/l228.8fe/ij’« 100/is long. If 
the delay spread is equal to 20/.is (which is typi¬ 
cal for hilly tenain) the intersymbol interference 
spans 20% of the bit duration. However, at the 
higher chip rate 5 Mcps the transmitted bit is only 
«20/islong which is of the order of the delay 
spread. Thus, the intersymbol interference result¬ 
ing fiom the previous transmitted bit must be 
taken into account. Wliile this is not obvious in 
the RAKE receiver, in our adaptive conelator the 
one-bit decision feedback can be implemented in 
a relatively sunple way (see Fig. 2). A feedback 
signal FIR filter (FSF) must be added which re¬ 
constructs the interference using the previously 
decided bit, the channel impulse response estimate 
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and the appropriate pait of the despreading se¬ 
quence. The reconstructed interference is next 
subtracted from the received signal carrying m- 
formation about the subsequent bit. The receiver 
works according to the following algonthm: 

1. die TDL of the feedback signal filter is filled 

with zeros . 

2. the TDL of tlie despreading sequence filter is 
fed with the despreading sequence chips and 
the resulting reference signal is correlated with 


tlie received signal samples 

3. after cliips the decision is made; the de¬ 
cided bit is taken into accomit during the next 
correlation period 

4. die contents of die TDL of die DSF is copied 
into the TDL of the FSF and the TDL of the 
DSF is filled with zeros 

5. die reconstructed interference signal is sub¬ 
tracted fiom die following samples of the re¬ 
ceived signal (belonging to the next bit) which 
are next conelated with the reference signal 
while the TDL of the FSF is fed with zeros 


6. goto step 3 , 1 •* 

The algorithm is managed by die control umt 

(see Fig. 2). 



Fig. 2 The block diagram of the decision feed¬ 
back adaptive correlator receiver 


4. SIMULATION RESULTS 

In order to verify the performance of the pro¬ 
posed receivers computer simulations have been 
performed. The simulated system was based on 
L Qualcomm proposal [4] ^^ch employs 
stage despreading using Walsli functions depend¬ 
ing on a channel number and pseudo-noise se¬ 
quence characterising a given base station. As a 
result the despreading sequence u, used in the 
adaptive correlator receiver is a product of the 

wo sequences. The 7“"' 

based on a GSM recommendation [5]. The chan 

nel estimator implemented in die receiver used 

^^f^obtained results show that the adaptive 
correlator receiver with 30 taps estimator cover¬ 
ing «24/^s delay spread performs in a HT 
channel at die basic chip rate 1.228 Mcps identi¬ 
cally as the 4-arms adaptive RAKE receiver with 
the delays adjusted in a time window of the same 
length (Fig. 3). The gain in companson to a ge- 
neiSd-arms RAKE receiver widi constant delays 
between the aims is equal to 3.5-4 dB @ 
BER=3e-3. The decision feedback version of the 
adaptive correlator gives basicaUy die same re¬ 
sults since the intersymbol interference spans on y 



-^no estim. -h-ARAKE 
-■-ad. corr. -^df. ad. corr. 


Fig. 3 The perfomiaiice of the adaptive coirela- 
tor receiver at iMcps chip late 
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The advantages of the decision feedback adap¬ 
tive correlator receiver can be seen at the higher 
chip rate 4.912 Mcps, where the iutersymbol in¬ 
terference spans 70% of the bit duration in the 


BER 



-*-ad. corr. ad. corr 

Fig. 4 The performance of the adaptive correla¬ 
tor receiver at 5Mcps chip rate 



-^1 user -4-10 users '*-26 users 


Fig. 5 The performance of the adaptive correla¬ 
tor receiver in the presence of intracell 
interference 


same propagation conditions (Fig. 4). The gain is 
equal to 2.5 dB @ BER=3e-3 in comparison to 
the receiver without the decision feedback. The 
only disadvantage in this case is the length of the 
estimator which must have 100 taps to cover the 
whole delay spread of the channel. 

The performance of the adaptive correlator in 
the presence of the intracell interference is pr e¬ 
sented on Fig. 5. The degradatiorr due to the in¬ 
creasing rutmber of users hr the same cell is com¬ 
parable to the degradation exlribited by the RAKE 
receiver. 


5. CONCLUSIONS 

The presented adaptive correlator receiver- 
may be applied in the transmission systems char¬ 
acterised by a long delay spread of the channel. 
Although it is more complicated than the adaptive 
RAKE receiver with comparable perfoiinaitce, it 
has one importarrt feature. Namely, a decisiort 
feedback can be easily implemeted which is indis¬ 
pensable iit a high-speed transmission systems, 
e.g. DS-CDMA mobile radio systems with a chip 
rate of the order of 5 Mcps. 

REFERENCES 

[1] J. G. Proakis, "Digital Communications", 
McGraw-Hill, 1983 

[2] R. Krenz, F. Muratore, G. Romano, 
"Charmel Estimatiorr for a DS-CDMA Mo¬ 
bile Radio System with a Coherent Recep¬ 
tion", Proc. IEEE VTC '94, vol. 2, pp. 724- 
728, 1994 

[3] R. Krenz, "Adaptive Receivers For DS- 
CDMA Mobile Radio Systems", Proc. Xlth 
Irrtemational Microwave Corrference MI- 
KON '96, Warszawa 1996 

[4] A. Salmasi, K. S. Gilhousen, "On the System 
Design Aspects of Code Division Multiple 
Access Applied to Digital Cellular and Per¬ 
sonal Communications Networks", Proc. 
IEEE ICC '91, pp. 57-62, 1991 

[5] ETSI-GSM Technical Specification, series 
05.05, "Transmission and Reception", June 
1991 


144 















ON THE ROBUST SPR CONDITION IN ADAPTIVE RECURSIVE SCHEMES 

Carlos Mosquera and Fernando Perez 


Dept. Tecnologias de las Comunicaciones. 
ETSI Telecomunicacion, Universidad de Vigo, 
36200 Vigo, SPAIN 


ABSTRACT 

The problem of finding and LTI filter making a set 
of plants Strictly Positive Real is presented, given its 
fundamental role in the convergence of an important 
class of adaptive recursive algorithms, based on hyper¬ 
stability concepts. The problem is solved in some im¬ 
portant cases with applications in many different con¬ 
texts. 

1. INTRODUCTION 

Adaptive infinite-impulse response (HR) filters are de¬ 
sirable in many situations as an alternative for adaptive 
finite-impulse response (FIR) filters, for their reduced 
complexity and improved performance. Important ap¬ 
plications include adaptive noise canceling, channel 
equalization, adaptive differential pulse code modula¬ 
tion, etc. Adaptive techniques for HR filters have been 
under investigation during the last years, taking results 
from the system identification field in many cases, due 
to the similarities between both areas. Convergence of 
the algorithms has been the main issue throughout this 
process; error surfaces are in most cases multimodal, 
and the analysis of convergence of gradient-based tech¬ 
niques becomes quite hard [1], with convergence to the 
global minimum not guaranteed in many cases. In ad¬ 
dition, most of those procedures need a stability moni¬ 
toring: otherwise the algorithm may diverge during the 
adaptation stage. Spurred by the convergence prob¬ 
lems, other investigators have borrowed from the con¬ 
trol field some tools based on hyperstability, which al¬ 
low the design of algorithms with proven convergence, 
provided that a Strictly Positive Real (SPR) condition 
is satisfied. Such condition involves the poles of the 
system under study, either unknown or only partially 
known. Although various suggestions have been made 
in order to relax the SPR condition, none of them is 
completely satisfactory given their suboptimality with 
output disturbance in some cases [2],[3], or conditions 
imposed on the input in other cases [4]. 


This paper deals with the robust SPR problem: try¬ 
ing to design a compensator to make the whole set 
of possible denominators of the system SPR for those 
cases at which the uncertainty is known, without al¬ 
tering the performance of the algorithm nor imposing 
conditions on the input. Some numerical results will 
show how convergence can be achieved when the appro¬ 
priate compensator C(z), obtained with our synthesis 
procedure, is used in the adaptive algorithm. 

2. ADAPTIVE HR ALGORITHMS 

Many adaptive HR filtering problems may be addressed 
in a system identification framework, in which a ref¬ 
erence model is hypothesized. The unknown transfer 
function is assumed rational, and the objective is to 
construct a rational approximation to the transfer func¬ 
tion, based on the input-output measurements, usually 
noise-corrupted. Figure 1 shows the adaptive filter in a 
system identification configuration, where 6 is the un¬ 
known parameter vector. The goal is the minimization 
of a performance criterion of the error e(n). Tradition¬ 
ally there have been two main approaches to the adap¬ 
tive HR filtering problem which correspond to different 
formulations of the error: equation error and output er¬ 
ror methods. We will not consider the first family of 
methods here; equation error methods have well under¬ 
stood properties, given their similarity to adaptive FIR 
methods. Their main drawback is the bias in the esti¬ 
mate when a disturbance ‘i;(n) is present in the output. 

The output error adaptive HR filter is characterized 
by the following recursive equation: 

N M 

y{n) = ak{n)y{n — k) + ^ bk (n)u(n — k) (1) 
k =\ ^=0 

with ak{n) and bk{n) the adaptive parameters of 
the filter. 

The output feedback is the reason why this algo¬ 
rithm is more complex than its equation error counter¬ 
part, in which feedback is done with the output of the 
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output dlsturbjuice 



Figure 1: System Identification Configuration 

unknown transfer function. 

Two main approaches can be made in the output er¬ 
ror problem [ 5 ]: minimization (gradient descent) view¬ 
point and stability theory viewpoint. The minimization 
approach leads to a gradient descent formulation. One 
of its drawbacks is the sometimes multimodality of the 
error surface being descended, although in some cases 
it can be shown to be unimodal [ 6 ]. However, the main 
concern in this approach is the need for on-line stability 
monitoring of the time-varying AR filter, which filters 
the regressor vector, and is recomputed at each stage 
as the estimation of the denominator of the transfer 
function. 

An alternate approach for adapting the parameters 
of the HR filter is based on the theory of hyperstability 
[ 7 ], a concept that was developed for the stability anal¬ 
ysis of time-varying nonlinear feedback. The adaptive 
IIR process can be viewed as a linear system having 
time-varying nonlinear feedback, and is chosen on the 
basis of assuring that the resulting closed-loop configu¬ 
ration is hyperstable, and hence convergent. This type 
of algorithms has a filtered version of the output er¬ 
ror by C{z)^ an optional compensator to try to satisfy 
the SPR condition exposed below. No need of stability 
checking is required. However, to ensure global conver¬ 
gence, the following SPR condition must be satisfied: 

>0.forall|z| = 1 (2) 

where Re{’) denotes real part, and A[z) = 1 — 
OLk is the denominator of the unknown plant. 
The scalar 7 depends on the specifics of the adaptive al¬ 
gorithm. For the Hyperstable Adaptive Recursive Fil¬ 
ter (HARF) and its simplified version (SHARF) [ 8 ], two 
of the main adaptive IIR algorithms based on hyper¬ 
stability concepts, it suffices with 7 = 0 . Other more 
complex algorithms, such as the Pseudolinear Regres¬ 
sion algorithm(PLR), require 7 = the same as in the 


original algorithm based on hyperstability ideas, de¬ 
veloped by Landau [9] for identification purposes, and 
adapted for adaptive IIR filtering by Johnson [7]. 

The main drawback of this type of algorithms is 
that the satisfaction of the SPR condition is critical for 
proper algorithm behavior, although convergence can 
be achieved in some cases even though such a condition 
is not satisfied [10]. Yet, there is no general method 
to eliminate the condition entirely, despite the efforts 
made in that direction [2],[3],[4]. 

The use of the compensator C{z) requires an a pri¬ 
ori estimate of A{z) to satisfy the SPR condition. A 
first approach would be to start with a different iden¬ 
tification scheme, such as least squares, to obtain an 
approximation of A(z) [11]. In some other cases, an a 
priori confidence set can be established for either the 
coefficients [12] or the roots of A{z) [13]. In both cases, 
satisfaction of the SPR condition for the whole uncer¬ 
tainty set reduces to the study of a finite number of 
transfer functions. In the following section it is shown 
how a C{z) can be found to make the whole uncertainty 
set SPR in some interesting cases, including those cases 
with nonpararnetric uncertainty. 

3. THE ROBUST SPR CONDITION 

The robust SPR problem was raised by Dasgupta and 
Bhagwat [14] and then by Anderson et aL in [12]. In 
the latter paper they provided a necessary and suffi¬ 
cient condition for the existence of a compensator C(z) 
making simultaneously several polynomials Ai{z) SPR, 
namely 

max \arg{Ai{e^^)) - 0 ‘rg{Aj{e^^))\ < tt, Vi,i (3) 

we[0,27r) 

and they depicted a method to get a minimum- 
phase FIR C{z) making several plants SPR, but with a 
numerical procedure which does not provide an a priori 
bound for the degree of the design. 

We will take here a different approach, trying to ob¬ 
tain practical procedures for the synthesis of the com¬ 
pensator C(z). We will present first the case of the 
simultaneous “SPRization” of a two-member family. 
There are two main reasons for considering this type 
of sets. First, it provides much insight to other simul¬ 
taneous SPRization problems, in the same way as the 
simultaneous stabilization problem. Second, there are 
quite a few sets for which finding C(z) making all the 
members SPR is equivalent to finding C{z) making two 
specific members SPR, namely, straight line segments 
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in parameter space, disks in root space, horizontal line 
segments in root space and vertical line segments in 
root space [13]. 

Let us consider two minimum-phase polynomials 
Ai{z) and A 2 {z), with the following definitions: 


Q{z) 
Riz) 
D(z) = 
Eiz) = 


C(z)Ai{z-^} + C(z-^)Ai(z) 

2 

C{z)A2{z-^)^-C(z-^)A2{z) 

2 ~ 


A\{z)A2{z - Ax[z '^)A2{z) 

2 " 


Ai[z)A2{z-‘^) + A i(z-^)A2{z) 


(4) 

(5) 

( 6 ) 
(7) 


We have designed an algebraic procedure to obtain 
C{z) for the two plants case, bounding the degree of 
the solution in terms of the number of roots of D{z) in 
(6) on the unit circle. The condition for the existence 
of C(z) is as follows: 

Theorem 1 A causal and minimum-phase C(z) mak¬ 
ing Ai {z) and A 2 {z) SPR exists if and only if the poly¬ 
nomial E{z) is real and positive at the roots of D{z) on 
the unit circle [15] . 


The construction of such a C{z) can be done by 
finding two symmetric positive functions Q{z) and R{z), 
that as can be noted from (4) and (5), share their sign 
with the real part of C(z)/Ai{z) and C(z)lA 2 [z) re¬ 
spectively. With the definitions above we can obtain 
C{z) as 


( . _ R(z)Ai{z) - Q{z)A 2 {z) /gv 

- D(z) ^ 

•with R{z) and Q{z) symmetric positive functions, 
satisfying the following interpolation conditions in or¬ 
der to cancel the roots of the denominator: 


R{ai)Ai (ai) - Q(ai)A2{ai) = 0 ( 9 ) 

with {a,} roots of D(z). The resulting filter C(z) 
could be non minimum-phase (and non causal), de¬ 
pending on the degree of the interpolating functions 
fi( 2 ) and Q{z). It can be made causal and minimum- 
phase by dividing it by a symmetric positive function 
containing the roots outside the unit circle and their 
reciprocals. 

To build R(z) and Q{z) we have derived a recur¬ 
sive algorithm, based on the Youla-Saito interpolation 


algorithm of SPR functions in the Laplace transform 
domain. 

The following corollary indicates those cases at which 
a minimum-phase FIR C{z) is obtained with our design 
algorithm: 

Corollary 1 If the number of roots of D(z) on the unit 
circle is less or equal than four, then an FIR C(z) can 
be computed. 

For instance, in the case of disks in root space men¬ 
tioned above, D(z) is a polynomial with only two roots 
on the unit circle, so an FIR compensator is obtained. 
It is worth saying that the previous corollary gives a 
sufficient condition, but not necessary, for the absence 
of poles of C(z) in C\{0}. 

The extension of these results to the three plants 
case involves the use of an additional symmetric posi¬ 
tive polynomial S(z) defined for A 3 {z): 

= C(z)A3(z-^)-hC(z-^)A3{z) 

If C(z) is obtained in (8), then S(z) can be ex¬ 
pressed as 

_ R(z)Dl 3 {z) - Q(z)D23{z) 

DMz) 

with the new polynomials Eij defined as Dij (z) — 
{Ai(z)Aj{z-^) - Ai{z-^)Ajiz))/2. 

Now, in addition to the interpolation conditions 
on R{z) and Q(z) shown above, some additional con¬ 
straints are imposed by the need of having S(z) also 
SPR: 


Q(z) ^ Aijz) 

R(z) A 2 iz) 

when Di 2 {z) = 0 on the unit circle and 


( 12 ) 


Q(^) / (13) 

R(z) ^ D23 {z) 

at the rest of the unit circle. Note that = 

at the roots of D^iz) on the unit circle (except 
1 and -1). 


Thus, the general form of the filter which makes 
simultaneously SPR three plants is of the same form 
as that for two plants. But some additional conditions 
(avoidance conditions) must be imposed when obtain¬ 
ing Q{z) and R(z) in (8). 

The problem becomes more difficult when dealing 
with four or more plants: new avoidance conditions 
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arise with each new plant, while keeping the interpola¬ 
tion conditions. This is a multiple avoidance problem, 
whose solution would provide a compensator C{z) si¬ 
multaneously making all the plants SPR. 

The extension of this results to the strengthened 
case does not seem an easy task. In [16] the strength¬ 
ened robust SPR problem was defined as the search for 
a rnonic minimum-phase C( 2 :) such that C{z)/A{z) — ^ 
is SPR for the whole uncertainty set. The problem was 
solved for the continuous-time case, with some com¬ 
ments about the difficulty of its discrete-time counter¬ 
part. It was proved that the necessary and sufficient 
condition of the robust SPR problem (7 = 0) expressed 
in ( 3 ) is also necessary and sufficient for the general 
case (0 < 7 < 1). In the two plants discrete-time case, 
a monic compensator for the case 0 < 7 < 1 can be 
found based again on interpolation conditions. The 
open problem remains how to get a minimum-phase 
compensator. In [17] a necessary condition is provided 
for the solvability of the discrete-time robust SPR prob¬ 
lem showing that the results of [16] cannot be extended 
to the discrete domain. 

However, for adaptive algorithms, it seems more in¬ 
teresting the derivation of a compensator C{z) with 
some constraints on its norm, either 11(7112 or ||C||oo, for 
disturbance rejection purposes. This is also an open 
research problem, given the difficulties of combining 
those constraints with the simultaneous positivity . 

Nonparametric uncertainty can also be solved in 
some cases. If we define the set to be made SPR as 
U{A{z)y K{z))t with 

A{z) = Ao{z) -f- A^(^), \AA{en\ < \I<{en\ M 

V u; E [0, 27r), then the following lemma can be stated: 

Lemma 1 If the set U{A{z)yK{z)) is stable, then the 
transfer function C{z) = A{z) makes the whole set 
SPR. 

The lemma is illustrated in Fig. 2, where L{z) de¬ 
notes the bound for the additive term 
bility of the set U{A{z), K{z)), provided that A{z) is 
stable, is equivalent to the so-called zero exclusion con¬ 
dition [18], if no degree reduction exists. In that case, 
{A{z) AA{z))/C[z) will be different than zero for all 

u, so SPRness of that term and its inverse is obvious. 

The following section will illustrate with a practical 
example how the previous results can be applied. 



Figure 2: Making SPR a set with nonparametric un¬ 
certainty 

4. EXPERIMENTAL RESULTS 

Figures 3 and 4 show the parameter trajectories when 
the HARF algorithm is used to identify a second-order 
plant whose poles are known to lie in a circle centered at 
0.6 and with radius 0.35. Using the design algorithm 
presented in section 3 the compensator C{z) = 1 — 
1.00132:“^ 4 - 0.06192:”^ was obtained. Convergence is 
achieved for a Signal to Noise Ratio of 20 dB and a step 
size /z = 0.01 if C(z) is used (Figure 3). Otherwise, the 
parameters will not converge (Figure 4) to the correct 
values. 



•(•rerilcin nuotiw (thousands) 

Figure 3: Parameter trajectories when SPRness is sat¬ 
isfied 


5. CONCLUSIONS 

This paper has presented the links between the robust 
SPR condition and an important family of adaptive 
recursive schemes, namely, those based on hypersta¬ 
bility concepts, which require the satisfaction of the 
SPR property in order to ensure global convergence. 
An algorithm was provided to design a linear time- 
invariant filter such that the robust SPR problem is 
solved for two plants, with some insights for the exten- 
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Figure 4: Parameter trajectories when SPRness is not 
satisfied 

sion of that solution to more complex cases, such as a 
higher number of plants and the strengthened robust 
SPR problem. It was also solved the robust problem 
with unstructured uncertainty, for which the number 
of possible plants is infinite. 

6. REFERENCES 

[1] S. D. Stearns. Error surfaces of recursive adap¬ 
tive filters. IEEE Transactions on Circuits and 
Systems, CAS-28:603-606, June 1981. 

[2] I.D.Landau. Elimination of the real positivity con¬ 
dition in the design of parallel MRAS. IEEE 
Transactions on Automatic Control, 23:1015- 
1020, December 1978. 

[3] A. Betser and E. Zeheb. Modified output error 
identification - elimination of the SPR condition. 
IEEE Transactions on Automatic Control, 40:190- 
193, January 1995. 

[4] M. Tomizuka. Parallel MRAS without compen¬ 
sation block. IEEE Transactions on Automatic 
Control, 27:505-506, April 1982. 

[5] C.R. Johnson Jr. Adaptive HR filtering: Current 
results and open issues. IEEE Transactions on 
Information Theory, 30:237—250, March 1984. 

[6] M. Nayeri. A weaker sufficient condition for the 
unimodality of error surfaces associated with ex¬ 
actly matching adaptive HR filters. In 22nd Asilo- 
mar Conference on Signals, Systems and Comput¬ 
ers, pages 35-38, November 1988. 

[7] C.R.Johnson Jr. A convergence proof for a hyper¬ 
stable adaptive recursive filter. IEEE TYans. on 
Information Theory, 25:745—759, November 1979. 


[8] M.G.Larimore, J.R. Treichler, and Jr. 
C.R.Johnson. SHARP: An algorithm for adapt¬ 
ing HR digital filters. IEEE Trans, on Acoustics, 
Speech and Signal Processing, 28:428-440, August 
1980. 

[9] I.D.Landau. Unbiased recursive identification us¬ 
ing model reference adaptive techniques. IEEE 
Trans, on Automatic Control, 21:194-202, April 
1976. 

[10] P. A. Regalia. Adaptive HR Filtering in Signal 
Processing and Control. Marcel Dekker, 1995. 

[11] L. LJung. On positive real transfer functions and 
the convergence of some recursive schemes. IEEE 
Trans, on Automatic Control, AC-22:539-551, Au¬ 
gust 1977. 

[12] B.D.O. Anderson, S. Dasgupta, P. Khargonekar, 
F.J. Kraus, and M.Mansour. Robust strict posi¬ 
tive realness: Characterization and construction. 
IEEE Trans, on Circuits and Systems, 37:869-876, 
July 1990. 

[13] F. Perez and C. Abdallah. Phase-convex arcs in 
root space and its application to robust SPR prob¬ 
lems. In Proc. 33rd IEEE Conf Dec. and Control, 
Orlando, FL, pages 3729-3730, 1994. 

[14] S. Dasgupta and A. Bhagwat. Conditions for de¬ 
signing strictly positive real transfer functions for 
adaptive output error identification. IEEE Trans, 
on Circuits and Systems, 34:731—736, July 1987. 

[15] F. Perez and C. Mosquera. Algebraic LTI filter 
synthesis for simultaneously making a convex com¬ 
bination of discrete-time plants SPR. In Proc. 34 th 
IEEE Conf Dec. and Control, New Orleans, LA, 
pages 780-781, 1995. 

[16] B.D.O. Anderson and I.D. Landau. Least squares 
identification and the robust strict positive real 
property. IEEE Transactions on Circuits and Sys¬ 
tems I, 41:601-607, September 1994. 

[17] C. Mosquera and F. Perez. A necessary condition 
for the strengthened robust SPR problem. Tech¬ 
nical Report DTC/CMN/300696, Universidad de 
Vigo, May 1996. 

[18] B. R. Barmish. New tools for robustness of linear 
systems. Macmillan, 1994. 


149 



AN EM APPROACH TO CHANNEL EQUALIZATION WITH MODULAR 

NETWORKS 


Jesus Cid-SueivOj Johny Ghattas 
ETSI Telecomunicacion. Univ. Valladolid. Spain 


ABSTRACT 

In this paper we discuss the application of the Expec¬ 
tation-Maximization (EM) algorithm to the equaliza¬ 
tion of digital communication channels with modular 
neural networks, extending the learning approach dis¬ 
cussed by Jacobs and Jordan in [JorQl]. We present 
a novel algorithm which shows a faster convergence 
than stochastic gradient rules at a moderate cost, and 
which can be applied to learning in both supervised 
and blind mode. Finally, we disscuss the elimination of 
the hidden variables in the algorithm by means of some 
previous symbols of the training sequence. Simulation 
results support the final conclusions. 

1. INTRODUCTION 

It is a well known property in the communications field 
that the equalization of digital communication channels 
is a non-linear problem, even if the channel distortion is 
linear. Several authors have shown that adaptive non¬ 
linear systems hiused on neural networks can i educe 
the bit error rate of conventional detectors ([Cid95|, 
[Gel93), [Kec94] and [Mul96] are just some examples). 
However, it is not easy to keep a moderate computa¬ 
tional cost and a reduced training time. 

In this paper we go further in the application of 
modular networks to the symbol detection problem, 
extending some previous work [Gel93][Cid94][Cid95]. 
T his kind of networks explore the idea of partition¬ 
ing the input space in subregions so as to assign dif¬ 
ferent modules or experts to each region; working in 
this way, the learning problem is divided in simpler 
tasks. The Hierarchical Mixture of Experts (HME) ar¬ 
chitecture is based on a soft space partition, and it has 
demonstrated a better performance than backpropaga- 
tion networks in equalization and other mapping prob¬ 
lems [Cid95][Jor91]. In [Cid95], several cost functions 
for stochastic gradient learning were compared, con¬ 
cluding that the logarithmic cost is the more adequate 
in an equalization application. The stochastic learning 
rules are simple, but they usually present a slow con¬ 
vergence speed. Jacobs and Jordan [Jor91] have shown 


that the Expectation-Maximization (EM) algorithm of 
Dempster et al.[Dem77] can be applied to this struc¬ 
tures reducing the training time, but at a very high 
computational load. They also propose an on-line al¬ 
gorithm, based on the application of Recursive Least 
Squares (RLS) algorithms to every module of the HME, 
that can be applied to regression problems; unfortu¬ 
nately, it shows a poor performance for classification 
ij£tsks 

The EM algorithm can be simplified if the global 
maximization step is replaced by a local maximiza¬ 
tion; in this paper we show that the convergence speed 
of the resulting algorithm is faster than that of the 
stochastic minimization of the logarithmic cost func¬ 
tion. Moreover, the proposed method can be extended 
to non-supervised learning, resulting soft-decision di¬ 
rected rules, similar to that proposed by Nowlan in 
[Now93], which has shown better convergence proper¬ 
ties than decision-directed methods in bhnd equaliza- 
tion. 

When the EM algorithm is applied to tram modular 
classifiers, the current symbol is used as a reference 
for computing the error measurement. However, the 
references for the gating nets should be constructed 
from a statistical data model. This paper concludes 
showing that previous symbols of the charmel can be 
used as references for the gating nets. This allows the 
application of RLS algorithms to symbol detection. 

2. THE HME NETWORK 

Fig. 1 shows the structure of a 3-level HME network. 
Its output is given by 

1 1 

y = ^ OiSj/iVij 

i—Q j=Q 

where t/y, gi\i and gi are the expert outputs, the first 
level gating nets and the top level gating net, respec¬ 
tively, = 1 " Qi\i and - 1 ” We assume that 
both expert and gating nets are linear filters with a 
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Figure 1: A Hierarchical Mixture of Experts (HME) 
with three levels 

sigmoidal activation unit; thus 


Vij 

= sigm (w7.x) 

(2) 

9l/i 

= sigm(vfx) 

(3) 

91 

= sigm (v^x) 

(4) 


where y/ij, and v are the weights of the different 
modules. 

Although a 3-level network is assummed in the fol¬ 
lowing,, the final results can be generalized to a higher 
number of levels. 

3o THE EM APPROACH» 

Let us assume that desired output d (the transmitted 
symbol in an equalizaton application) is generated ac¬ 
cording to one of 4 possible Bernoulli distributions with 
parameters {i = 0,1, j = 0,1). Following [Jor91], 
we define indicator variable Zij such that zij = 0 for 
every (i,i) except for the selected model {kyl). The 
Bernoulli distribution of the selected model can be ex¬ 
pressed as follows, 


a case, it is straightforward to see that the HME be¬ 
comes the optimal bayesian detector. 

To compute the network weights and v lead¬ 

ing to such estimates, we take the logarithm of proba¬ 
bility model in Eq. (6), 


EE Zij (Ing'i + \ngj/i + InP (d | w^, x)) 

i 3 

(7) 

When the indicator variables are known, the HME 
weights could be estimated maximizing the log-likelihood 
function in Eq. (7). Unfortunately, this is not the case. 
The EM algorithm of Dempster et al. [Dem77] solves 
this problem by taking the expectation of L (d, x) (step 
E) with respect to the so-called hidden variables Zij be¬ 
fore its maximization (step M). It can be shown [Jor91] 
that 


E{L\x, Wij,Vi, V, rf} = ^ ^ hij higigj/iP {d \ Wij,:x.) 

i 3 

( 8 ) 

; _ 9ij (1 — d — yij) 

l-d~y 

The expected log-likelihood is a non-linear function 
of the weights. In order to avoid the computational 
load required by its complete maximization (M step), 
we replace the global maximization step by a local one 
computing a single iteration of a stochastic gradient 
search; in [Jor91], this is done assuming that hij in 
Eq.(8) is independent of the network parameters; the 
resulting algorithm is equivalent to a stochastic gradi¬ 
ent minimization of a logarithmic cost function. How¬ 
ever, it is clear from Eq. (8) that hij depends on the 
network weights. The learning rules resulting when this 
dependence is considered are derived in Appendix A, 
and shown below 


P {d I x,Zij = Si-kSj-i) = nil (1 - nkif (5) 

where nij are the (r, j)-Bernoulli distribution parame¬ 
ters. Therefore, we can write, 

p{d,Zij I x) = nn (7" \x)Pid\ wy-,x))^- 

i 3 

( 6 ) 

Assume that output yij of the (i, j)-expert of an 
HME architecture is an estimate of mj, and the gat¬ 
ing net outputs are used to estimate conditional prob¬ 
abilities P {zij I x) in such a way that gij = gigjji = 
P (zij I x). It is easy to see that P{d\x.) — y)^ 

where y is just the output of the HME network. In such 


Awy = pfij(d-yij)x (10) 

Avi = p {fii - fiPi/i) X 

Av = p{fi-9i)^ 
where p is the adaptation step and 

fij = hij (1 + ^ij) (11) 

fi = E/« 

3 

Variables eij are defined as 
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Figure 2: Evolution of the BER vs time, using a 
logarithmic cost (continuous) and with the proposed 
method (dashed). 


dj — In hij — ^ ^ ^ ^ hij In hij (12) 

i 3 

and they make the difference between the previous al¬ 
gorithm and the minimization of a logarithmic cost, 
which can be obtained by assumming eij = 0. 

As an example, Fig. 2 compares the convergence vs 
time of both algorithms when training a 4-level HME 
architecture to equalize the linear non-minimum-phase 
channel with transfer function H {z) = 0.5 + for 
IbdB SNR The inputs to the network are the last 2 
received samples. The adaptation step was optimized 
independently for each algorithm.The resulting EM al¬ 
gorithm has a significantly faster convergence. 

Although it is not observed in the figure, we found 
that the final bit error rate of the proposed scheme is 
higher than that of the resulting from using a loga¬ 
rithmic cost. This suggest the possibility of using the 
EM-based LMS algorithm for a fast startup and, after 
that, switching to the stochastic minimization of the 
log cost. As we have seen, this can be done simply 
making Sij = 0 in Eq. (11). 

3.1c BLIND EQUALIZATION USING EM 
APPROACH. 

Although this line is not explored in this paper, we 
note that a non-supervised EM algorithm that can be 
applied to blind equalization can be derived form Eq. 
(7) by assuming that the desired response, d, is also a 
hidden variable. Applying expectations over it, we de¬ 


rive the algorithm for an N-level HME Network in App. 
B. The resulting algorithm is directed by soft-decisions, 
in a way similar to that of Nowlan and Hinton [Now93], 
which has shown a better performance than decision di¬ 
rected methods in tracking abrupt channel variations. 

4. LIGHTING THE HIDDEN VARIABLES 

Each level of the HME network makes consecutive soft 
partitions of the input space. Note that, in the EM 
approach, the hidden variables Zij determine which ex¬ 
pert network should make the symbol decision. The 
EM algorithm solves the lack of knowledge about these 
hidden variables by computing their expected values. 
However, if they were known, they could be used as 
output references for the network modules, and each 
module could be trained separately in a more efficient 
way. In an equalization application, a posible way to 
do so is to use the past training symbols as references, 
as we describe below. 

Let us consider, for example, a 3 level HME net¬ 
work. We train the gating net in the top level (level 
2) using symbol Xk ^2 as their output reference; level-1 
modules with reference Xfc-i, and level-0 modules (i.e. 
the expert networks) using Xk,. In each level only one 
expert is trained at a time depending on the value of the 
past training symbols, which leads to lower numerical 
cost. Using the same 3 levels HME network example, 
in level 2, the unique expert is always trained, while in 
level 1 the first expert is trained only if Xk -2 =0 and 
the other is trained only if Xk -2 = 1; in level 0, expert 
1 is trained only if Xk-2^k-i = 00, expert 2 is trained 
only if = 01, expert 3 only if Xk- 2 ^k-\ = 10 

and expert 4 only if Xk- 2 ^k-i = 11. 

Working in this way, each module (a sigmoidal per- 
ceptron) can be trained separately; moreover, as only 
one filter at each level is updated at every training step, 
the computational load is reduced,: it increases linearly 
with the number of levels (i.e. with the logarithm of 
the number of modules). Also, making a hard partition 
of the input space (replacing the soft decision devices 
by hard slicers), each module becomes a linear equal¬ 
izer which can be trained using standard LMS or RLS 
algorithms. 

Fig. (3) shows that the use of the previous symbols 
of the training sequence to update the gating nets offers 
significant faster convergence and lower bit error rate 
than those based on hidden variables. Linear channel 
H [z) = 0.5 + with 15dB of signal to noise ratio. 
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Figure 3: Evolution of the bit error probability during 
training for the RLS and LMS with diferent references 
for each level and no hidden variables (h.v.) vs the 
EM-LMS and the standard stochastic gradient search 
with a logarithmic cost. The curves are the average of 
400 simulations, after smoothing with a low pass filter. 


5. CONCLUSIONS AND FURTHER WORK 

The main topic of this paper is the problem of find¬ 
ing references for training the gating nets in HME net¬ 
works. We found that the EM algorithm computes 
references based on a Bernoulli distribution for the 
data. If the global M-step is replaced by a local M 
step, the resulting algorithm has a faster convergence 
than stochastic gradient search rules. Blind learning 
rules are also derived, whicli generalize the soft-decision 
equalizer proposed by Nowlan [Now93]. On the other 
hand, we found that, in an equalization application, 
the previous symbols of the channel can be used as ref¬ 
erences for the gating nets. Working in this way, the 
network is not globally optimized, but learning is much 
faster and standard linear estimation methods can be 
applied. 

The work carried out on this paper opens several 
lines for future research. While the EM approach can 
be applied to any classification problem, the use of pre¬ 
vious symbols for training gating networks takes advan¬ 
tage of the fact that every sample at the receiver input 
depends on more that one transmitted symbol. Other 
applications have this property, and will be explored in 
the future. 

Finally, it is possible to use previous symbols as 


references for starting up the HME network, tracking 
the channel variations using the blind EM algorithm. 
The advantages of this approach over other decision- 
directed methods are being studied. 


6- APPENDIX A: SUPERVISED EM-BASED 
ALGORITHM^ 

During the M step of the EM algorithm, function Q = 
£■ {L I X, Wij, v^, V, d] given in Eq. (8) has to be maxi¬ 
mized with respect to the network parameters. Differ- 
enciating Q with respect to Pij, gj\i and gi respectively 
we get 


V'UJijQ Qp ♦ SJwij Pij ~ p (1 + £ij) S/wij Pij 

^ dQ 

S/viQ - 2_^ -K— Vvi gkji = 

k 

~ ^ 9k\i 

dQ 

^ ^ /ifcn (1 “f" SJ V Qk 


S7 vQ — 


where Sij has been defined in Eq. (12). Considering a 
Bernoulli model for the output of each expert, we can 
write 


'^wijPij — (2d — yij)x 

= - 9 i|i ( 1 -< 71 ^) 3 ; 

SJvQi = gi{\-gi)x 
V«5o = -pi (1 - pi) X 

so the resulting learning rules (10) for a 3-level HME 
network result. 

The generalization of these formulas to an N-level 
HME network is straightforward, considering that each 
of the variables gij^ Pij^ gi in such cases will have more 
indices. 


7. APENDIX B: BLIND EM-BASED 
ALGORITHM» 

A blind equalizer results when we assume that d is also 
a hidden variable. In such case, the expectation of 
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Eq.(7) witli respect, to d lias to ])e computed before the 
M step, arriving to 

Ed{Q} = P{d^0)Q{d = 0)^P{dr=l)Q{d^l) 

Considering that the HME network is an optimum 
bayesian estimator, the output of the network is y = 
p{d^i) 

Ed{Q} = {l-y)Q{d^O) + yQ{d = l) 

So, the resulting learning rules are as follows, 

Aiuij == p {yfij (d = 1) - faVij) X 
Avi - p {fij - figjii) X 
Av = p {fi - Qi) X 

where 

= (1 - y)fij id = 0 )+ yfij {d = 1 ) 

Ti - {i-y)fi{d = G)-\ yh{dr= i)^.Yj^- 

k 

8. REFERENCES 

[Cid94] J. Cid-Sueiro, A.R. Figueiras-Vidal: The Role 
of Objective Functions in Modular Classifica¬ 
tion (with an Equalization Application); Proc. 
of the 1st. hit. Conf. on Neural, Parallel and 
Scientific Computations, pp. 110-115; Atlanta, 
GA, May 1995. 

[Cid95] J. Cid-Sueiro, A.R. Figueiras-Vidal: Digital 
Equalization using Modular Neural Networks: 
an Overview; Proc. of the 7th. Int. Thyrrhe- 
nian Workshop on Dig. Comm., pp. 337-345; 
Viareggio, Italy, Sep. 1995. 

[Dem77] A.R Dempster, N.M, Laird, D.B. Rubin: 
Maximum Likelihood from Incomplete Data 
via the EM Algorithm. J.R. Statistic. Soc. B, 
No. 19, pp. 1-38, 1977. 

[Gel93] S.B. Gelfand, C.S. Ravishankar, E.J. Delp: 
Tree-Structured Piecewise Linear Adaptive 
Equalization; IEEE Tiunsactions on Commu¬ 
nications, Vol. 41, No. 1, pp. 70-82, Jn.n. 1993. 

[Jor91] M.I. Jordan, R.A. Jacobs: Hierarchical Mix¬ 
tures of Experts and the EM Algorithm; Neu¬ 
ral Computation] Vol. 6, pp. 181-214, 1991. 


[Kec94] G. Kechriotis, E. Zervas, E.S. Manolakos: Us¬ 
ing Recurrent Neural Networks for Adaptive 
Communication Channel Equalization; IEEE 
Trans, on Neural NeUuorks, Vol. 5, No. 2, pp. 
267-278, Mar. 1994. 

[Mul96] B. Mulgrew: Applying Radial Basis Func¬ 
tions, IEEE Signal Processing Magazine, Vol. 
13, No, 2, Mar. 1996. 

[Now93] S.J. Nowlan, G.E. Hinton: A Soft Decision- 
Directed LMS Algorithm for Blind Equaliza¬ 
tion; IEEE Transactions on Communications, 
Vol. 41, No. 2, pp. 275-279, Feb. 1993. 


154 



NONLINEAR RECURSIVE ALGORITHMS FOR DATA 
TRANSMISSION-EQUALIZATION. 

E. Soria-OIivas^‘\ J.Calpe-Maravilla^*’, A.R. Figueiras-Vidal^^\ 


(1) G.P.D.S Dpto de Informdtica y Electrdnica, Facultad de Ffsicas. 
C/Dr Moliner, 50,46100 Burjassot (Valencia). Spain 
e-mail: eniilio.soria@uv.es 

(2) D.L, E.P.S. Telecomunicacidn, Universidad Carlos III 
C/Butarque 15, 28911 Legan6s (Madrid). Spain 
e-mail:anibal@gtts.ssr.upm.es 


ABSTRACT 

In this paper some recursive adaptive ^algorithms to be 
applied to classification problems are studied. These 
algorithms result from including a non-linearity at tlie 
output of the adaptive filters. The aimed application is 
binary classification which fixes tlie type of the non¬ 
linearity used. The developed algoritiims are compared in 
a typical channel equalization problem proving tlieir good 
performance. 


1. INTRODUCTION. 

At tlie proposed application in this paper, channel 
equalization, the adaptive system must classify samples 
into two classes (binary classification). Figure 1 shows a 
diagram of a transversal adaptive equalizer. 



LTE^Linear Transversal Equalizer 


Figurel: Transversal adaptive equalizer diagram 

The hard threshold applied to the LTE output classifies 
tlie input into one of the two considered classes. By 
observing Figure 1 wc may introduce the first 
modification to tlie proposed system. As tlie goal of our 
system consists on a classification into two classes, we 
will apply a nonlinearity before tlie hard tlireshold (Figure 
2 ). 


x{n) 


y(n) / 


SvSltM 




o{n) 


Figure 2: Pioposcd system representation. 


In this paper classes are assumed to be ±1, thus, the 
proposed function is: 


Fiy) = 


y < -1 => Fiy) = -1 
lyl< 1=^ F(y) = y 
y>l=>F(y) = l 


In case of analyzed sequence were formed by 0 and 1, die 
previously defined function would take the value 0 for 
inputs lower than -1. 

The next step for determining the new algorithms is the 
election of the filter. As tlie aim lays on applying recursive 
algorithms (RLS) that reveal much faster than usual LMS, 
we will use FIR filters [1]. 

The next section proposes and develops several algoritiims 
applied to the structure given in Figure 2. 

2. DEVELOPMENT OF THE ALGORITHMS 

2.1 Nonlinear Recursive Least Squares I (NRLSI)* 

If the adaptive system given in Figure 2 is very near to 
convergence error and output signal are orthogonal [1]. 
Hence, if we use an error criterion based on a decreasing 
weighting, this leads to: 

/=! i=l 

Then, we have three different possibilities: 

a) If y(i)>l tlien system’s output o(i) is equal to 1 and tlie 
previous sums result as: 

and XA"-'. 

isl » 

Thus, we may distinguish two sub-cases depending on the 
first term: 


a.l) If they are equal it docs not affect tlie system 
because it works correctly and it need not any 
modification. 
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a.2) If they are different we must modify the 
system, but we assume it to be near convergence 
so no modification is perfonned 

b) If y(i) <-l we have a situation analogous to a) 

c) If ly(i)l<l, applying tlie definition of F, tlie summing 
factors result: 

/=i 

and 

• X X"~‘d (i)w‘ iOxiOw' 

Ml 

and we have to adapt the system. As usual, tlie starting 
point for adapting is the minimization of the weighted 
errors sum, which leads to apply tlie normal equations, but 
considering that weights modification occurs in tlie linear 
zone exclusively. 

In a standard RLS problem, tlie autocorrelation matrix of 
the input signal and the cross-correlation matrix between 
the desired signal and the input are [1]: 

R(n + 1) = XRin) + Xin + l)X'(n + l) 

and 

g(n + 1) = Xg(n) + d(n -1- l)X(n -H 1) 

with: 

X(n) = [jc(n). x(n-L+l)V 

witii L equal to tJie adaptive filter lengtli. 

So as to obtain tlie new algoritlim we suppose that from 
tlie iteration n to tlie iteration n+s, no changes have 
happened i.e, inside tliat interval, tlie output of tlie system 
remains out of tlie linear zone, and applying the previous 
equations iteratively we get to 

Rin + s)= X^Rin) + - + X^~^Xin + l)X^in + 1). 
gin + s) = X^ gin)+ -+X^~^din + 1) X(n +1). 

When adapting the filter coefficients we could consider 
taking all the terms obtained from tlie previous equations. 
This procedure would lead us to a “block” version of RLS. 
The proposed modification consists in neglecting all tlie 
terms from iJie sum except the first two terms. This way, 
tlie algoritlim discards all die cases out of die linear zone. 
Widi diis approximation, equations are analogue to diose 
presented in [1] for die RIJS but replacing die X parameter 
widi X^ , where s gives die number of iterations widiout 
changes. A benefit to be remarked is die computational 
savings: diere are no computations when die adaptive 
filter output comes out of die linear zone. 


2«2 Nonlinear Recursive Least Squares II (NRLSII). 

The second algoridim is obtained by “smoodiing” die first. 
This results from applying: 


to die output of die filter. By this way, the output always 
remains between ±1 in a smoother way (Figure 3). 



Figure Function used in NRLSII 


The next step consists of avoiding the change of die filter 
coefficients when the output is near the extreme values. 
This is obtained multiplying the gradient term (error times 
input) by the factor: 

Widi this product we would get die gradient term if we 
consider die non-linearity action but we have to underline 
diat an RLS algoridim is carried out and after diat, adding 
diat factor avoids changes in die filter coefficients. Hence, 
we have die equations given by [1] for an RLS but 
including a term when updating weights. 

Karaiiyiaiinis and Venetsanopoulos [2] propose a similar 
algoridim, ELEANNE 3. However, diere are some 
differences widi die one proposed here because 
ELEANNE 3 considers the proposed factor for calculating 
die autocorrelation inverse matrix which is not made here. 

23 Nonlinear Recursive Least Squares III (NRLSIII). 

Following die considerations made in [3], die outputs of 
die system may be considered as die probability 
corresponding to die dj element from die desired output to 
have a value 0 or 1. Considering mutual independence 
among samples, we have [3]: 

Pid\x) = flo'‘/il-Oj)^'-‘‘^' 
i=i 

The logariUim of the previous expression is die cross- 
entropy between die output and die desired output. Our 
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Uj =^{l-o])Xj 


system gives to tliesc sigiuils Uie values ±1, If we consider 
these new variables we get Uie expression: 


P(d\x) = 


n 


>=1 





and 


Rin) = '^{l-o])xjx‘j 
;=i 


This equation is equivalent to: 


Pid\x) = e 



tire second derivative may be obtained in a recursive way 
because: 


R(n +1) = Rin) + Uju] 


handling tliis expression we have: 


P(d\x) = e 




and denning: 


In 

V 


1 - Oy y 


=yj 


where yj is tlie adaptive niter output. Thus: 




-1 

+1 


and we have tlie way to get Oj. Substituting tliis result in 
the expression for P(dlx) we get: 


Pid\x) = e 




Now we want to maximize the previous function 
(maximum likeliliood) respect to the filter coefficients, W. 
A possible iterative method for solving this problem is tlie 
Newton-Raphson metiiod [4]: 


W(n + 1) = W{n) 


' 2 

d In Pid\x) ' 


dwdw 




B In /*(</! jc) 
BW 


(n) 


Calculating tlie derivatives: 

^In P(d\x) 


n 

= I 


^ in) y=l 






and: 


and tlie matrix-inversion lemma [1] may be applied to tliis 
expression so, we have a recursive metliod to solve 
classification problems in a similar way to tliose given in 
12]. But here, differently from [2], witli tlie new algoriUim, 
we have obtained a non-linear function for applying tlic 
recursive algoritlims; besides, no approximations have 
been made. 

We must outline one drawback to tliis metliod: when we 
are very near tlie optimal system, tlie gradient term is not 
zero because all tlie terms since tlie beginning of iterations 
are considered. To solve tlie problem, only tlie last tenn of 
the sum which gives tlie gradient is taken into account. 

3. EXPERIMENTAL RESULTS 

The proposed algoritlims have been tested in a channel 
equalization problem such as tlie one shown in Figure 1. 
.The simulations assume tlie transmission channel to have 
tlie following transfer function: 


Wc Study tlie different convergence speed for tlie 
algorithms proposed. In each iteration it has been 
calculated tlie number of errors caused by tlie filter when 
classifying a fixed number of samples (10^). 

During tlie comparison, algorithms NRLSI and NRLSII 
have been considered jointly because botli have tlie same 
common point and the difference consists of tlie kind of 
non-linearity (hard or smooth) applied to tlie output. 
These algorithms have been compared to tlie standard 
RLS. To be able to distinguish tlie different speed it has 
been taken a 5 dB SNR. The tliree algoritlims have tlie 
same parameter value (0.99). Furtlicnnorc, tlie results are 
obtained as an average over 10 experiments . 


' In P{d\x) '^ 
^ dWdW' ) 
If wc dcrinc Uie variable: 


1 " 


Lil-0^j)XjX] 
^ ;=1 
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Figure 4: Speed of convergence for RLS, NRLSI and NRLSII. 


Figure 4 shows that NRLSI converges in a non-unifonn 
way, while tlie NRLSII smoothes all these jumps. The 
tliree algoritlims converge to the same final state, varying 
their performance only in the first iterations, being the 
NRLSII faster that fite saturated version. The next 
comparative, concerning the convergence speed, is 
established among die three algorithms proposed, keeping 
die same SNR (Figure 5) 



Figure 5; Speed of convergence for NRLSI, NRLSII and 
NRLSin. 


The comparison among die three algoridims lead us to die 
same conclusions previously stated; differences are found 
in die initial iteradons, while die final error is the same 
in all three cases. The NRLSIII shows a performance 
which is halfway between the odier two. 

Once die converge speed has been compared, die 
probability of error after convergence is studied. Figure 6 
shows die probability for die proposed channel and 
different SNR. Again die same parameter value (0.99) is 
maintained. The same BER after die convergence is 
observed in all diree variants. 


w^o* ' BER Aftof 



SNR 


Figure 6: BER after convergence for different SNR -- 

To conclude, Table I below shows tlie computational 
savings in tlie NRLSI due to tlie adaptation of tlie filter 
coefficients in the segment between ±1 solely. It is given 
the number of iterations in which the weights are updat^ 
for a total of 2500 and taking an error threshold under 
which no adaptation is made. The table has been obtained 
taking an average on 10 experiments. 

SNR (dB) 5 10 15 20 

RLS 2467’2 2477*5 2477*1 2477*3 

NRLSI 1659*7 1584 1602*7 1605*3 

Table I: Number of iterations for different SNR 

4. FUTURE WORK 

Extensions of this work will focus on die generalizing of 
the proposed algoridims in order to be applied to 
multilayer perceptrons. This generalization pretends to 
apply fast learning algorithms such as RLS to solve the 
drawback of die low learning speed of Uiese systems. 

5. CONCLUSIONS 

In this paper diree non-linear recursive algoridims for 
channel equalization have been proposed. Their 
performance has been verified through simulations. 
Results show the ability of diese systems to carry out die 
task of equalizing communication channels which gives a 
first step to extent dieir use to neural networks. 
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ABSTRACT 

A fully adaptive algorithm for blind channel equal¬ 
ization is presented. It is based on an adaptive ma¬ 
trix singular value decomposition (SVD) for a (vir¬ 
tual) channel identification type operation, together 
with the Viterbi algorithm for subsequent symbol de¬ 
tection. True channel modeling, however, is avoided, as 
will be explained. Simulation results for a GSM type 
setup are presented. 

1. INTRODUCTION 

The problem of blind channel identification/equalization 
using second-order statistics or equivalent determinis¬ 
tic properties of the oversampled channel output has 
drawn considerable attention recently. Most of the al¬ 
gorithms developed up till now, however, are based on 
block processing and have a high computational com¬ 
plexity which is an impediment for real-time implemen¬ 
tation. 

Here we present a new algorithm which is fully adap¬ 
tive. It has a low computational complexity and fur¬ 
thermore it allows to track very fast varying channels. 
It will be shown that the performance of previously 
developed blind equalization algorithms is closely ap¬ 
proximated (despite the reduced computational com¬ 
plexity). 

Piet Vandaele is a Research Assistant supported by the 
I.W^.T. and M. Moonen is a Research Associate with the Belgian 
National Fund for Scientific Research (N.F.W.O.). This research 
work was carried out at the ESAT laboratory of the Katholieke 
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tion Project of the Flemish Community, entitled Model-based 
Information Processing Systems and partly in the framework of 
the IT-program of the I.W.T., Integrating Signal Processing Sys¬ 
tems (ITA/GBO/T23). The scientific responsibility is assumed 
by its authors. 


2. DATA MODEL 

The received signal for linear digital modulation over a 
linear channel with additive noise, is 

y(t) = Y,h(t-kT)-x[k] + n{t) 

k 

where the a:[*] are the transmitted symbols, T is the 
symbol period, h{t) is the composite channel impulse 
response (it includes transmitter, channel and receiver 
filters), and n{t) is additive noise. The channel is as¬ 
sumed to be FIR with duration of approximately LT . 
With an oversampling factor M, the sampling instants 
for the received signal are f© + (A; -f • T for integer 
k and z = 1,2,..., M. It is common to use a so-called 
polyphase description 

yi[k] = y{to + (fe -f ^ 
ni[k] = n{to + (A: + * U 

hi[k] = h{to-P{k+ ^ )*^)» z = l,2, ...,M 

and to view the oversampled received signal as an M- 
channel output signal at the symbol rate [9,10]. Define 
the output vector y[A;] = [yi[A:].. .2 /m[Aj]]^» the input 
vector x[A;] = [x[k - L].. .a:[A:]]^ and the noise vector 
n[A:] = [ni[A:]... then 


y[k] 


hi[L] ... hi[i] /m[0] 
••• hM[0] 


x[A:] -f n[A;] 


H 
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Spatial oversampling, i.e. using multiple antennas at 
the receiver, fits into the same framework by consider¬ 
ing yi[A:],. .. ,yM[k] as the outputs of M receiving anten¬ 
nas. I^om here on we may therefore consider M to be 
the product of the spatial and temporal oversampling 
factor. 

For the sake of short notation, it is assumed that n[A;] = 
0. The computational scheme to be presented, involves 
an SVD which is assumed to be robust against such 
additive noise [3]. 

With the above input/output-formula, a data model 
can be put up as follows (which has been the start¬ 
ing point for many algorithmic developments already). 
Define 




y 

k] 

y 


... y 

y 

A: + l 

y 

+ 2 

... y 

y| 

k + 2] 

y 

Jb + sl 

... y 


L y[^+»~i] y[* + <] 


1] 

k + j] 


y[fc + » + i-2] 


(the superscript refers to the number of columns, the 
subscript refers to the time indices in the first column), 
and with a similar notation 


yO) 

^k-L\k+i-\ 


x\k — L] 

X k — L + 


1) 


xlk + 1-1] 


xlk-L + j] 


then, 


y (i) 


H 


H 


0 0 ... 
0 ... 


0 0 


H 


00 ... H 


vU) 

^k-L\k+i-l 


n 


Here, Yk\k+i-i ^ known matrix. The aim is to com¬ 
pute the symbol sequence from 

with or without computing 7i explicitly. 


3. BLIND EQUALIZATION 

Block Processing Algorithm [5] 


The algorithm of Liu & Xu [5] is based on a singular 
value decomposition of ^ ‘short-fat’ 

matrix, i.e. with many more columns than rows) 




Then a symbol sequence is sought that best matches 
the row space of 

where is the (approximate) null space of 

0), which is extracted from the V- 
matrix of the SVD. This is equivalent to 

h~-L]k — L ' 

(subject to some constraint to avoid the trivial solu¬ 
tion) which may be solved by means of standard least 
squares techniques. 


0 ... 0 

0 ... 0 

0 0 ... 


Adaptive Algorithm 


Here, a modification to the above scheme is presented, 
which is fully adaptive and furthermore employs the 
Viterbi algorithm in an efficient manner. 

A crucial observation is that the V-^ in the block pro¬ 
cessing algorithm may be viewed as a (virtual) FIR 
channel (different from the physical channel, obviously) 
that produces a zero-output when fed with a specific 
segment of the symbol sequence. 

A new SVD is computed in each time step (t. e. in each 
symbol period), 


This may be viewed as identifying a time-varying FIR 
channel for which: 


yU) 

^k-L\k+i-l 


Vk^ = 0 


or equivalently, 



This last equation can then be applied in a Viterbi al¬ 
gorithm in a straightforward manner, in order to obtain 
the original symbol sequence from the knowledge of the 
‘virtual* FIR channel and its zero output. State 
transitions in the Viterbi trellis are then governed by 
a cost function • Visin’ where || • \\f denotes the 
Frobenius norm. Increasing j provides a better noise 
averaging, but the number of states in the Viterbi al¬ 
gorithm equals 2^“^ for a BPSK symbol constellation. 
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Ume=l ‘i“e = 2 



Figure 1: Viterbi trellis at iteration step 3 

Because of this exponential complexity of the Viterbi 
algorithm, j should be kept as small as possible. On 
the other hand, for highly time-varying channels, aver¬ 
aging over long data sequences may not be meaningful 
and so the usage of smaller matrices may be imposed 
by practical considerations anyway. 

We further reduce the complexity of the Viterbi algo¬ 
rithm in the following way. At iteration step 0, only 
the first symbol has been received and hence only the 
smallest digit of the state number is important. We 
can arbitrarily choose this digit since we can only de¬ 
termine the transmitted symbol sequence up to an un¬ 
known constant. As a consequence we can freely choose 
a starting state in the trellis. This state gets a zero ini¬ 
tial cost while all other states get an initial cost oo. 
Instead of expanding all paths in the trellis at each 
time instant, we only extend the leave node having 
the lowest cost at that iteration step. A ’leave node’ is 
the endpoint of a partially developed path. 

Consider figure 1 in which a four-state Viterbi trellis 
is depicted at iteration step 3. At iteration step 0, 
we chose state 00 as a reference state and gave it zero 
initial cost, while all other states got initial cost oo. 
Next, in iteration step 1 we extended the paths through 
the coordinates {(0,0),!} with costs a and 6. In the 
2nd iteration a was higher than h and so we extended { 
(0,1),2}, and in step 3 the costs 6+c and 64*d were both 
higher than cost a so we extended the path through { 
(0,0) ,2}. If we apply this method systematically, we are 
guaranteed to find the minimum cost path, and this at 
a lower computational cost (we do not expand all paths 
any longer). A disadvantage is that the computation 
time is no longer constant, the upper limit however 
stays the same as with the full Viterbi algorithm. 


For the computation of the an efficient SVD- 

updating algorithm from [7] is used, whic 
true recursive (sample-by-sample) processing (in 
of block processing). This algorithm has a reduced 
computational complexity, and is amenable to application- 
_imnlAmentation f61. 


4. SIMULATION MODEL 


The measurement model assumes a large number of 
rays impinging on a uniform linear array. > ^ 

Rayleigh flat fading channel, the rays have a Gaussian 
angular distribution [1] around the nominal direction 
of arrival, then it can be shown [12] that the M x 
channel impulse response vector h 
be modeled as a Gaussian distribution A (0, Rhh)- t 
fx 1/1 element of the covariance matrix Rhh (denot¬ 
ing the correlation between the signals received on the 






with (T 0 the spread on the angular distribution, d the 
inter-antenna-element distance, A the wavelength and 
00 the nominal direction of arrival of the si^^. With 
the above expression, simulation samples of h can e 
constructed as: ^ 

where 5 is a M x 1 complex zero mean Gaussian vector 
with the variance of real and imaginary parts equal to 
1 / 2 . 

This model can be extended to frequency selective fad¬ 
ing by including several time-clusters of scatterers. The 
r*'* cluster is then characterized by its vector hr and its 
time delay Tr- If P denotes the number of clusters then 
the received signal y[t] can be expressed as. 


y[t] = Ylf^rx(t-rr) 
r=l 

The modulation scheme at the transmitter is GMSK. 
Although this is a nonlinear type of modulation, it was 
shown in [11] that GMSK modulation can be approx¬ 
imated well by a linear channel model, requiring only 
minor modifications to the receiver algorithms. 


5. SIMULATIONS RESULTS 

In order to qualify the performance of the proposed 
algorithm, it is compared against [8] which has Proven 
to be very efficient in combating multi-path effects [2j. 
The main computational steps of this algorithm are 
summarized in the appendix. 
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Figure 2: BER estimation 


In a first setup we took bursts of 150 symbols, gener¬ 
ated a time-invariant channel using the above model 
with the reduced TUx tap specifications 4]. We per¬ 
formed 400 runs and a new channel model was gener¬ 
ated in each cycle. r t, „„i 

In a second setup we considered the case of a channe 
for which the time constant is about 20 symbols. In 
order not to violate the assumptions of the channel 
burst invariance of algorithm [8] we generated bursts 

of 20 symbols. , , 

We use 6 antennas which are 2 times oversampled, the 
antenna-element spacing is A/2. The SNR is taken wit^ 
respect to the input power of the channel, i.e. 
E[x[k]W. In both setups the parameters * j ot 
our algorithm were set to respectively 2 and 8. The 
channel order was estimated from the SVD singu ar 

Slrresults are depicted in figure 2. The fi^re shows 
that for blocks of 150 symbols, algorithm [8] outper¬ 
forms the algorithm presented L 

ability to average the noise over the full length of the 
burst The results for blocks of 20 symbols however, 
are favorable to the ’SVD+Viterbi’ algorithm, indicat¬ 
ing the real power of the proposed algorithm. 

6. CONCLUSIONS 

In this paper we developed a new algorfrhm for blind 
charad a,ualization. The algorithm d.to. from pro- 
vious algorithms in that it is a fully adaptive. iWs 
approach has two advantages: the computational com- 
plerity can be lowered and the model is able to track 
very fast varying channels. The performance of the 
algorithm was investigated through experiments m a 


GSM type setup. 
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Appsndix: Moulines et al. [8] 

Consider a processing window of length r. Define y W = 

[yj[it]".. .yAfM"]" with yj[I:] = [j/j[A:+r-l] • • •yi[fc]] 
and n = ['H^ with 


r hi[0] 


ni = 



hi[L] J 


rx(r-t-L) 


y[A:] = TfxW with x[A;] = [x[k+r-i]---x[k 
Define h = [hf • • • hjj]" with hj = [/i<[0] • • • hi[L]] . 


SO that 

I,]p. Define- ^ 

Then the algorithm can be summarized as follows: 


1. Compute the output covariance matrix Ry: 

= :^r7 2*Li’'y[^]yW^‘ 

2. Compute the eigenvalue decomposition of Ry and 
determine the noise subspace: 


U = [Ui--' U^rM-L-r)] 


3. Exploit the orthogonality off/ and (i.e. U 

0) by minimizing h^Qh under the constraint 
||h|| = 1. The matrix Q is defined hy Q = 
with the M{L +1) x (L -1- r) ma¬ 
trix Hi constructed using the M (r x 1) segments 
. f/f ^ of the noise vector [/<: 

Ui = [f/W^.-t/PT 

We replaced the first two steps in the algorithm by 
an estimate of the left singular vectors forming the 
nullspace of 

Y = [y[l]---y[Ar-r]] 

(because of better numerical properties), and added 
a fourth step in which the estimated channel impulse 
response was fed into a Viterbi trellis to determine the 
original symbol sequence. 
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Abstract 

Global convexity and fast convergence are both the keypoints of blind, baud rate, equalization techniques. In some 
previous works we proposed blind schemes based on linear constrained Bussgang equalizers. Easy implementation and 
globally convex characteristics, under certain hypotheses, have been their main advantages over standard algorithms. 
However, our proposal shares the slow convergence behaviour with existing equalizers and therefore, our interest is now 
pointed to improve the convergence speed. Our main goal is the development of a blind updating technique based on more 
efficient stochastic gradient tools: conjugate gradient techniques as a trade-off between complexity (matrix inversions are 
not required) and convergence speed. 


1. Introduction 

Blind equalization techniques intends the retrieval 
of the input data given only the channel output and some 
statistical or deterministic information of the channel input. 
At this moment, there exist two main approaches to blind, 
baud rate, equalization schemes: on one hand, Bussgang 
algorithms are based on the estimation of the transmitted 
symbol by a zero-memory non linearity; simple 
implementation is achieved but also non convex cost error 
functions are involved due to the nonlinearity. On the other 
hand, higher order statistics based algorithms intend the 
equalization of the transmitted sequence through the 
identification of the minimum and maximum phase 
components of non minimum phase channels; it is shown 
the global convexity of the cost functions but the 
computation requirements make difficult the practical 
implementation in many applications; also an important 
delay is usually needed to properly estimate the cumulants 
involved [Hay94]. In some previous works [Zaz94, Zaz95] 
we developed a new scheme for blind equalization labelled 
Modified Decision Directed Algorithm (MDDA) which 
improves the performance of the standard algorithm 
providing convex functions under certain conditions. 
However, the scheme although faster than the CMA or 
Stop-Go ones, is not comparable in convergence speed to 
non blind strategies by using an instantaneous gradient 
updating. 

As an alternative to the instantaneous gradient 
schemes, it is well known that Conjugate Directions 
techniques have been developed for the quadratic problem 
leading to significant practical advances. Also, an 
extension for non quadratic problems is possible by 
introducing line search methods [Lue89]. Additionally, 


[Bor92] propose the application of conjugate gradient 
techniques for adaptive filtering as a trade-off between 
computational complexity and convergence performance: 
the method proposed is capable of providing convergence 
comparable to RLS schemes with a computational 
complexity that is intermediate between the LMS and the 
RLS methods. 

Our main goal in this paper is to incorporate 
conjugate gradient techniques to the MDDA in order to 
achieve global convexity with convergence similar to 
gradient search non blind schemes. In this analysis (CG- 
MDDA) we have considered binary transmission, first 
through minimum phase channel to get sensitivity to the 
problem, and fmally we propose a scheme to rapidly start¬ 
up a non minimum phase channel. Some simulations over a 
raised cosine impulse response with controlled amplitude 
distortion corroborate this desirable performance. 


2. Our proposal. Minimum phase channel 

Let us start showing a simple transmission scheme 
assumed perfectly synchronised at baud rate (fig.l). The 
channel response h{n\ includes the transmitter filter, the 
channel and the receiver filter. First at all we will consider 
minimum phase channel, and later in other section we will 
extend our strategy to a generic non minimum phase 
channel. Recall that he MDDA is based on the anchoring 
of one equalizer tap (the first one for minimum phase 
channel and the center tap for non minimum phase 
channel), in order to preserve the current symbol, avoiding 
the presence of local minima. Also, another degree of 
freedom is introduced as an adaptive decision device for 
gain (and optionally phase ) recovery. 
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Fig.l. Binary transmission block diagram 


The cost error ftinction is very similar to 
the Decision Directed (DD) scheme, updating the 
parameter set in the direction of the minus 
instantaneous gradient: 

y=£:{[r-psign(y)]^} 


part of the input vector, and so on we are 
minimising the ISI. The expression we have 
found solving the line search equation given by 
(3) is: 

c^P'RPd^ 

dl^P'RPdj 


In a similar way, we have considered 
another gradient strategy to update the tap-weight 
vector: a conjugate gradient technique. In a 
general case, the whole adaptive parameter set 
(c,p) in the cost function should be considered 
into the conjugate gradient scheme. However, in 
this case, the behaviour of the equalizer taps is 
very different to the gain parameter: quadratic (at 
least for minimum phase channels) in the 
coefficient set and not quadratic (but convex) in 
the gain parameter. Therefore, we propose a 
combined adaptation rule: instantaneous gradient 
for the gain parameter and conjugate gradient for 
the equalizer coefficients (of course including the 
linear constraint): 

P*.I =P* -hV (y) 

(T 

^*+1 “ ^*djt 


Observe the important differences in 
both updating rules: in the case of instantaneous 
gradient, the step size is fixed and the innovation 
is proportional to the instantaneous gradient. In 
the other hand, for the conjugate gradient 
technique, the step size ajt is chosen under a 
certain optimal criteria; So on, the innovation is 
proportional to a conjugate direction dj^ [Lue89], 
both to be determined. The key point of the 
algorithm should be to design an optimal 
condition for the line search method minimising 
the cost error function along the line: 


rmn(j(c, +a.dj) 


(3) 


It can be shown that, for minimum phase 
channels and assuming a conditioned gaussian 
model for the ISI [Zaz95 and references therein], 
that condition (3) is equivalent to minimise the 
cross correlation between the equalizer output and 
part of the input vector (the whole vector except 
the first element in the minimum phase case). 
This choice seems to be appropriate because we 
assume that the current data is not present in this 


where superscript ^ means transpose, R is the 
input correlation matrix (should be estimated 
recursively), and P is the projection matrix on the 
linear constraint. 


The problem is then to compute the 
appropriate set of direction vectors. It is well 
known that the conjugate gradient method is the 
conjugate direction method that is obtaining* by 
selecting the successive direction vectors as a 
conjugate version of the successive gradients 
obtained as the method progresses: 

=-g,+P.d* (5) 

where means the successive gradients, and Pj^ 
are the constants chosen to provide the R- 
conjugacy for the vector djt with respect to the 
previous direction vectors. To calculate the 
iterative conjugate directions we have applied an 
extension to nonquadratic problems known as the 
Fletcher-Reeves method [Lue89]: 




gig* 


( 6 ) 


2. Simulations. Minimum phase channel 

A very simple but also intuitive result is 
shown in this section: let us consider a one pole 
channel (z=-0.5) and also a two taps equalizer: we 
have represented the error surface to observe the 
different trajectories followed for the 
instantaneous gradient and the conjugate gradient 
algorithm (See Fig.2): a very interesting topic is 
the different convergence speed: about 10 
samples for the CG-MDDA and almost two 
hundred for the MDDA (See Fig.3); under this 
performance, we must point out that the cost 
function is still convex but the convergence speed 
could be now comparable to a trained stochastic 
gradient schemes. 
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ro 

Fig.2. Error surface and trajectories of the 
instantaneous gradient and conjugate gradient 
algorithms. 



Iteration 

Fig3.. Evolution of the coefficients of the 
instantaneous gradient and conjugate gradient 
algorithms. 

3. Non minimum phase channel 

A minimum phase system could be a 
good model for most of the radio multipath 
propagation channels, where a strong direct ray is 
received among a few reflected rays with varying 
delays, amplitudes and phases. However, in other 
applications like data transmission over telephone 
lines, channels involved have a non minimum 
phase characteristic. 

Although the extension to this new 
situation could be intuitive: fixing the center tap 
as the main responsible of the current data, and 
also leaving parameter p for gain adjustment, the 
analysis of this situation arise in complexity. A 
closed expression for the optimal stepsize as 
equation (4) could not be found because the 


current symbol is present in the post and also 
previous symbols. Therefore, we have followed a 
similar development as we developed in [Zaz95] 
to justify an equivalent condition as one given by 
(4): we guess that the main part of the current 
symbol is provided by fixing the central tap; in 
spite of the fact that this symbol is also present in 
the remainder input vector, we have observed in 
several real channels that the linear constraint will 
preserve the current data avoiding from any 
possible misconvergence (the cancellation of this 
symbol will lead to a local minima). Therefore, 
we have a very simple key for fast start-up a non 
minimum phase channel implementing the same 
expression as one given by equation (4) where 
matrix P in this case is the projection matrix on 
the linear constraint fixing the center tap. Of 
course we realise that this strategy is not able to 
assure the right convergence for any channel, but 
in the next section we verify by simulations that it 
could be a trade-off criterium for not very high 
distortion channels 


5. Simulations. Non minimum phase channel. 

One of the most critical points of blind 
equalization is that the convergence speed is not 
comparable to non blind algorithms. Of course, 
RLS algorithms will be always much faster, but 
also it is known that the computational 
requirements (matrix inversions) could be a 
serious disadvantage in many applications. In this 
section we want to evaluate the evolution of the 
CG-MDDA to compare its performance with 
trained LMS algorithm (the most popular non 
blind adaptation rule). 

The channel we have chosen is given in 
equation (6) and is taken of [Bor92-Hay91] as a 
raised cosine impulse response where JV controls 
the amplitude distortion in the channel 
(eigenvalue spread). We have consider two 
different situations with a low and high 
eigenspread to show that in any case the 
performance is competitive, also showing an 
interesting property of conjugate gradient 
techniques: the convergence speed is almost 
independence of the eigenspread in contrast with 
LMS achieving a great dependence with this 
issue. 

^.)^|0.5[l + cos(27t(/-2)/W^)] /=1,2,3 

[O otherwise 

In Fig.4: we have considered parameter W=2.9 
which implies a low eigenvalue spread of 6.07. 
Also in Fig.5 we have chosen W=3.5 as the worst 
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eigenvalue spread of 46.8. In both cases we have 
consider the same conditions choosing randomly 
the starting point to justify the convex 
characteristics of the GC-MDDA error surface. 
The equalizer length is set as 11 taps (the center 
tap is kept fixed) and the signal to noise ratio is of 
30 dB. 

The implementation of a conjugate 
gradient adaptive filtering is not so simple as in a 
purely quadratic problem: we have followed the 
recommendation of [Bor92] using an averaging 
window to estimate the gradient; in this 
simulations we have chosen an average of three 
(An interesting discussion about this issue is 
adressed in [Bor92] and we have not observed 
any significant improvement for wider windows). 


CG-MDD vs IMS 



Fig.4. eigenvalue spread: 6.07. Learning curves of 
the CG-MDDA and LMS. 


CG-MDD vs LMS 



Iteration 

Fig.5: eigenvalue spread: 46.8. Learning curves of 
the CG-MDDA and LMS. 

Finally, to point out that our proposal 
could be implemented more efficiently in two 
steps using the CG-MDDA just for fast opening 
the eye-diagram and a second step (after about 50 
samples) driven by the MDDA to reduce the 
complexity: in this way the computation and 
storage requirements are comparable to the LMS. 


5. Conclusions and lines of further research. 

The main conclusion we want to point out is the 
fact that we have speed-up a globally convex 
blind equalization to a convergence speed similar 
to trained strategies, also Independently of the 
eigenvalue spread. We have introduced a different 
updating rule, considering an instantaneous 
gradient technique for the adaptation of the gain 
parameter and a conjugate gradient technique to 
update the equalizer taps. Although the 
calculation of the optimal step size is only 
available for minimum phase channel, an intuitive 
approach has been developed for non minimum 
phase channels. Simulations over controlled 
amplitude distortion channels support the scheme 
we have proposed. 

Also recall that the channel (3) could be 
considered as a good model for many situations in 
radio mobile applications where a fast and 
desirable blind equalization should be required. 

The use of non instantaneous gradient 
techniques to speed up blind algorithms, as 
conjugate gradient techniques, could be applied to 
other Bussgang equalizers. This could be a line of 
further research where much work must be done 
in the theoretical approach to design the optimum 
step size for the line search method providing the 
minimization of the desired cost function. 
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abstra;ct ^ 

This paper introduces a means of generating synthetic 
facial image sequences from speech., Using the Facial 
Action Coding System (FACS), facial expressions re¬ 
lated to speech are correlated with phonemes. The Ge¬ 
netic Algorithm is introduced as a means of generating 
the parameters for manipulating a neutral face image 
to match the intended target image. Examples gener¬ 
ated by a prototype system are also included. 

1. INTRODUCTION 

Somewhere in our collective imagination, we have all 
envisioned the day when computer systems can commu¬ 
nicate and interact with humans in a natural, instinc¬ 
tive fashion. Motivated by this possibility, there has 
been tremendous interest in developing signal process- 


; gpeecli. These include: 

.1.1=^' • Extracting phonemes and timinjg information from 
rf^-or text. 

‘ A; • ’Developing an anatomic and muscular model that 
■/1 ‘ may be used to generate lip movements from a 
' command string. 

' • Developing a distortion measure that may be used 

to measure the performance of a sequence of lip 
movements. 

• Search for the optimum sequence of model pa¬ 
rameters that will generate the most natural- look¬ 
ing lip and facial motion. 

In this paper, we will focus, primarily, on the latter 
problem: search for the optimum model parameters. 

2. MODELING OF FACIAL EXPRESSION 


ing systems that are capable of understanding and emu¬ 
lating human expressions, gestures, and speech. While 
text-to-speech synthesizers have made great advances 
since their introduction over 20 years ago, realistic fa¬ 
cial animation systems are still at embryonic stages of 
development [1]. The most common communication 
modality between humans is speech; likewise, we are 
interested in systems that can not only generate and 
recognize realistic human speech but are also capable 
of synthesizing the corresponding facial animation. 

Although the expressive bandwidth of the entire hu¬ 
man face is enormous, the region containing the mouth 
and lips are, arguably, the most involved in communica¬ 
tion. We have elected to concentrate on a system that 
emulates lip motion because of its obvious and natu¬ 
ral complement to speech synthesis. There are many 
different problems associated with the development of 
a system for generating synthetic lip movements from 

Much of this work was conducted for A. Peng’s dissertation 
research. 


Reducing the, seemingly infinite, expression space of 
the human face to a finite parameter-based model is a 
difficult task. To quantify and represent this infinite 
space, we are employing the facial expression repre¬ 
sentation system based on the Facial Action Coding 
System (FACS), which was developed by two psychol¬ 
ogists, P. Ekman and W.V. Friesen in 1977 [6]. FACS 
is widely used in contemporary facial animation en¬ 
gines and offers comprehensive parameterization of fa¬ 
cial expressions. This approach uses action units (AUs) 
to describe the changes in the appearance of the face 
caused by contraction of an isolated or a contiguous 
group of facial muscles. FACS furnishes 46 AUs for de¬ 
scribing facial expressions and 12 AUs for gaze direc¬ 
tions and head movements. However, we are primarily 
concerned with the four groups of lower face AUs ger¬ 
mane to speech. These four groups, classified by theii 
translation or rotation along the primary axes, are de¬ 
noted as the vertical, horizontal, orbital, and oblique 


169 





Action Units 

Description 

Vertical 

Raise Upper Lip 

ylC/lO 

At/15 

Depress Lip Corner 

AC/25 

Part Lip 

AC/26 

Drop Jaw 

AC/27 

Open Mouth Wide 

Horizontal 

Stretch Lip Horizontal 

AC/20 

Oblique 

Pucker Lip 

AUl^ 

AC/22 

Funnel Lip 

Orbital 


AC/12 

Stretch Lip Corner 

Miscellaneous 


AC/29 

Thrust Jaw 

AC/32 

Bite 


Table 1: The 11 Speech Related Action Units 


2.2. PHONEMES 

Phonemes are the distinctive, discrete sounds that com¬ 
pose speech. Construction of each phoneme is often 
correlated with a particular contortion of a facial mus¬ 
cle group, making FACS an ideal tool for quantifying 
expressions. The goal of our system is to map speech 
phonemes to facial expressions by constructing the ex¬ 
pressions from a weighted, linear combination of AUs. 
Thus, for the synthesis of lip movements, we must iden¬ 
tify the optimum eleven-dimensional AU weight vector 
w for each frame in an image sequence. An expression 
f can be constructed as 

f z= loivi-t-W2V2 + • • • + wiiivii = Vw, (1) 

where Wj, r = 1,2,3,..., 11, are the corresponding ex¬ 
pression weight vectors. For example, to model the 
upper lip raise, wi = 1 and Wi = 0, i = 2,3,..., 11. 


AUs. We also use two action units whose unique dis¬ 
placement escaped classification and are referred to as 
miscellaneous. Table 1 contains the 11 AUs that are 
involved with speech production. 


2.1. WIRE-FRAME HEAD MODEL 

In order to generate each facial expression, a neutral 
face image will have to be modified. The person’s im¬ 
age is first texture mapped on to a 3D wire-frame head 
model To correct for facial asymmetries, each model 
has been tailored to fit the anatomical structure of the 
person’s head. As AUs manipulate the location of spe¬ 
cific nodes on the wire-frame model, the image wrapped 
around the model is, consequently, distorted. This pro¬ 
cedure allows us to render facial expressions. 

Because AUs represent displacement vectors at dis¬ 
crete points, e.g., AU = {dx dy dz)-^, they ^e additive 
and non-orthogonal. For example, several AUs can be 
combined to form new action units. By exploiting this 
property, we can use weighted, linear combinations of 
these 11 AUs as a basis set for synthesizing all speech 
related motions: 

B = [AUlO, AU12, AU15,..., AU32]. 

For convenience, we let vector vi represent AUlO, V 2 
represent AU12, and so on until we have 11 vector com¬ 
ponents to represent our AU-space such that 


( Vl.l 

Vl,2 . 

.. Vl.U 

C 

II 

^2,2 • 

. . U2,U 

\ ^3,1 

^3,2 • 

. . U3,ll 


3. ITERATIVE FACIAL EXPRESSIONS 

In order to determine the expression weight vectors for 
a particular phoneme, we must train our system with 
a target image representing this expression. Given a 
target image T with an expression that we want to 
synthesize, w represents the parameterization of the 
target expression, e.g., fparm • T ^ 'U(w,'w) 

be a function that measures the distortion between two 

parameterized expressions w and w. A synthetic facial 
image S wearing an expression parameterized by w is 
generated from an initial, neutral expression image N. 
e.g., /sunth : (N,w) —> S. Our goal is to find w such 
that 

mm(2?(w,w)). (2) 

If N contains the same person as T, then 

||T - S|| -4 0. (3) 

To solve this optimization problem, we use the genetic 
algorithm to automatically find w. 

3.1. GENETIC ALGORITHM 

Genetic Algorithms (GAs) are machine learning tech¬ 
niques based on the principles of genetic variation and 
natural selection [4]. Developed by Holland (1975) as 
an attempt to model various natural phenomena, GAs 
are widely used to model evolution in artificial-life sys¬ 
tems. The problem of finding an optimum weight vec¬ 
tor w for a given phonemic facial expression presents 
a challenge because the 11 weight components of w 
must be found simultaneously. We are attracted to 
the GA’s simplistic and elegant nature as well as to 
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its pow6r to rsipidly sirrivo sit good solutions to coni“ 
plex high-dimensional problems [5]. Moreover, GAs 
can provide an exceptionally powerful search heuris¬ 
tic for large, complex spaces if the space to be searched 
is not well understood, is relatively unstructured, and 
can be effectively represented by a GA [5]. 

The GA method works by evolving one population 
of chromosomes to a new population, or generation, us¬ 
ing selection combined with various operators. In our 
case, bit strings representing candidate weight vectors 
play the role of chromosomes with individual weights 
portraying the genes. The selection process is driven 
by a fitness function that evaluates the chromosomes 
in a population, identifies those that will be allowed to 
reproduce, and decides the number of offspring each is 
likely to have. As in nature, the fittest chromosomes 
produce more offspring than less fit ones. The two cat¬ 
egories of genetic operators that manipulate chromo¬ 
somes are known as crossover and mutation operators. 
Crossover operators trade genes between chromosome 
pairs whereas mutation operators alter genes within a 
single chromosome. 


3.1.1. GENETIC OPERATORS 


There are four types of crossover operators utilized in 
our system: simple, arithmetic, single, and heuristic. 
See Table 2 for their definitions. Consider the chro¬ 
mosome pair wi = (iwi,i u;i,2 • • • ^i,» • • • 

W2 = (U)2,1 tU2,2 .. • W2,s ■ ■ ■ ^2.11^, where an integer s 
in the range [1,11] is randomly selected as the mutation 
site. After crossover operations, this pair becomes Wi 
and W 2 , respectively. The heuristic crossover operator 


Operator 

Chromosomes after the Operator 

Simple 

wi = (tyi.i tyi,2 • • * 

W2 = (W2A W2.2 • • • • • • “U;!.!!) 

Arithmetic 

Wi = (tWl.X iyi,2 • • • OiW2,a + (1 
. . . aw2,n + (1 — 

W2 = (ti;2,l ^2,2 . . + (1 ~ Oi)tV2,s 

... aiui.ii + (1 — oi)xV2,ii) 

0 < a < 1 

Single 

Wl = (tWl,! 1^1,2 • • «tU2,« • • • '*^ 1 , 11 ) 

W2 = (ty2.i tU2.2 .. • 1^1,« 1^2,»+i • • • n;2,ii) 


Table 2: Genetic Crossover Operators, excluding the 
Heuristic Crossover 


uses domain specific knowledge to encourage the repli¬ 
cation of chromosomes that carry certain gene com¬ 
binations. More specifically, this operator selectively 
adjusts certain weight components so that candidate 
weight vectors reflect the expression. For example, if 
the face in the target image has a mouth opened widely. 


weight components applied to AU26 or AU27 will dom¬ 
inate. Heuristic operators are responsible for providing 
improved performance by considerably reducing the pa¬ 
rameter search space. 

To perturb individual chromosomes, uniform and 
non-uniform mutation operators are implemented. Given 
a chromosome w = (toi 1 U 2 • • • • • • ^ii)^ > integer s 

in the range [1,11] is selected as the site where mutation 
will occur. During uniform mutation, the gene at site s, 
Wi, is replaced with a random number /3 which is uni¬ 
formly distributed on the interval [0,1], e.g., w, —> /3. 
Otherwise for non-uniform mutation, tu* ^ ui» + S, 
where J is a random number non-uniformly distributed 
on [0,1]. A special mutation operator has been devel¬ 
oped, referred to as the complement mutation, which 
replaces w, with its inverse, e.g., w, = 1.0 - w,. 


3.1.2. GENETIC EVOLUTION CYCLE 

The initial population has an expression weight vec¬ 
tor of all zeros, which corresponds to a neutral facial 
expression, i.e. mouth closed. The evolution cycle is 
guided by system parameters that influence selection 
dynamics. The single, arithmetic, and simple crossover 
operators each have an associated probability of oc¬ 
currence, denoted Psnc, Pac, and Psic, respectively. 
The uniform and non-uniform mutation operators also 
have probabilities, defined as Pum and Pnm, respec¬ 
tively. Uniform mutation is primarily used during early 
iterative stages in order to evaluate a wide range of 
candidate chromosomes; hence, Pum » Pnm. Each 
time a better solution is found, Pum is decreased while 
Pnm is inflated. Towards to end of the evolutionary 
cycle, non-uniform mutation becomes more prevalent 
so that gene selection can be finely tuned; likewise, 
Pnm ^ Pum. However, if no improvements are made 
after a certain number of iterations, the complement 
mutation is applied to stimulate the gene pool with 
candidates that may have been overlooked. All muta¬ 
tion probabilities are then reset to their initial values. 

Both mutation and crossover operators may gener¬ 
ate expression weight vectors that do not correspond 
to any meaningful phonemic expressions. For instance, 
weights associated with AU27 (mouth open) and AU32 
(lip bite) should not both be dominant since biting 
one’s lip is fairly impossible given an open mouth. Heuris¬ 
tic constraints have been added to eliminate such weight 
vectors, further shrinking the solution space and in¬ 
creasing algorithm robustness. 

After the population of chromosomes has been al¬ 
tered by genetic operators, it is filtered by a fitness 
function designed to rank its members. Fitness is mea¬ 
sured by evaluating the distortion between the synthe¬ 
sized and the target image. We have identified five 
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control points surrounding the lips: center of the outer 
upper lip, center of the inner upper lip, center of the 
inner bottom lip, center of the outer bottom lip, and the 
comer of the lip. The distances of all of these points 
are taken with respect to the center of the eye. By sep¬ 
arating the horizontal from the vertical components, 
we can make quantitative measurements to determine 
the distance between points; hence, describing the lip’s 
shape. 

Since the target image can contain a different per¬ 
son than the synthesized image, distances must be nor¬ 
malized by an expression ratio defined as 

^ _ distance of a control point on neutral face 
distance of same point on target 

The distortion V is computed as a weighted sum of the 
difference between the corresponding ratio components 
of the target and synthesized expressions. It is given 
by 

6 

i=i 

where t = 1... 5, and are the horizontal 

and vertical expression ratios, respectively, and R{t^h)i 
and R(t^v)i are the horizontal and vertical expression 
ratios, respectively; Wi is a real on [0,1]. 

3.1.3. GENETIC SELECTION 

Generation selection is implemented by a roulette wheel 
method with non-duplicate reproduction and elitism. 
In this approach, fitness scores of the population are 
first summed, e.g., 

N 

Q = ^y^fitness{u}i), (5) 

*=i 

where N represents the no. of candidate chromosomes 
in the generation and fitness^-) denotes the fitness 
function described in section 3.1.2. A random integer p 
on the range [0, Q] is generated. Each population mem¬ 
ber (candidate weight vector) is examined sequentially 
to determine if its fitness score and those of preceding 
members are collectively equal or greater than p. The 
first chromosome satisfying this condition is chosen as 
a member of the new population. Duplicate members 
are removed and the most fit, or elite, member of the 
iterative population is always retained in the new gen¬ 
eration. This entire process is repeated multiple times 
until all members of the target population are found. 
Generally, the fitness level of the population is propor¬ 
tional to the number of evolutionary iterations. 


3.1.4. FRAME INTERPOLATION 

While language can be disected into phonemic compo¬ 
nents, stacking phonemes sequentially will not produce 
natural-sounding speech. Likewise, synthetic images 
representing phonemes only supply key frame in a nat¬ 
ural animation sequence. Moreover, phonemic struc¬ 
tures do not provide any timing information, so the nat¬ 
ural expressive quality is often lost. We used two inter¬ 
polation methods between phonemic key frames to sim¬ 
ulate fluent animation: position and Expression Weight 
Vector interpolation [2]. With position interpolation, 
vertices on the 3D wire-frame model are adjusted ac¬ 
cording to the displacement between key frames and 
F<+i, e.g., 

x„,t = ax<,fc + (1 - a)xi 4 .i,* + a(xi,fc - x,+i,fc), ( 6 ) 

where 0 < a < 1, k = n = 1,2,.. .,M, and 

Xn,* = i^n,k Vn.k ^n.k)'^- N is the no. of vertices and 
M is the no. of intermediate frames to be generated 
between Fj and F^^-i. Vector interpolation builds each 
new expression vector w„ between and Wi_|.i using 
linear interpolation, e.g., 

w„ = awi + (1 - a)w<+i, (7) 

where n and a are unchanged from eq. 6. 

4. EXPERIMENTS ON PROTOTYPE 
SYSTEM 

To evaluate the system described above, we have im¬ 
plemented a prototype system developed by A. Peng 
on a Windows NT platform using the OpenGL graph¬ 
ics library and C [3]. Neutral and target face images 
are displayed, permitting the user to interactively se¬ 
lect the lip control locations using the mouse. After 
specifying the maximum number of iterations in the 
modeling process, expression weight vectors are passed 
as parameters to the synthesizer. The synthesizer up¬ 
dates the facial image on the screen in real-time. We 
used texture-mapped images on a wire-frame anatom¬ 
ical head model combined with synthetic speech to en¬ 
hance realism, 

4.1. RESULTS & DISCUSSION 

Using target image training, a parameter database that 
maps phonemes to facial expressions was created. Files 
containing speech broken into phonemes can be fed to 
the system which will fabricate the corresponding an¬ 
imation. Figures 1- 3 shows several frames from an 
animation sequence of the word “teeth” [3]. 
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Figure 1: First 2 key frames from the word “teeth”. 


Figure 3: Middle 2 key frames from the word “teeth”. 



Figure 2: Ending 2 key frames from the word “teeth”. 


Unlike other optimization problems where the best 
solution is required, sub-optimal solutions often pro¬ 
vide very decent solutions. Although distortion de¬ 
creases with addition iterations, this error may be less 
noticeable to human eyes, particularly if implemented 
in real-time. From our preliminary experiments, 20 - 60 
iterations generally produce synthesized facial images 
that looked strikingly similar to the target image. 

We also considered adding more control points to 
improve lip shape resolution. While adding more con¬ 
trol points permit more accurate distinction between 
images, the complexity of the system increases. There 
was a consensus to keep five points so real-time per¬ 
formance would not be handicapped. Other variations 
of population selection, such as steady-state reproduc¬ 
tion, were also evaluated but did not yield any signifi¬ 
cant improvements. 
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ABSTRACT 

Color quantization of still images can be easily stated as 
a clustering problem. Color quantization of sequences of 
images becomes a non-stationary clustering problem. In 
this paper we propose a very simple and effective 
evolutive strategy to perform adaptively the computation 
of the color representatives for each image in the 
sequence. Salient features of the evolutive strategy 
proposed here are: individuals correspond to individual 
cluster centres, to approach real-time response we impose 
one-generation adaptation for each image, only mutation 
operators are applied and these mutation operators arc 
guided by the actual covariance matrices of the clusters. 
Experimental results on a sequence of indoor images arc 
presented. 

1 INTRODUCTION 

Color Quantization [1, 2, 3, 4] is 
an instance of the more general technique of Vector 
Quantization [5] in the space of colors. Although it is 
well known [6] that the euclidean distance in the RGB 
space does not preserve the perceptive distance between 
colors, most of the approaches applied in practice and 
reported in the literature work in the RGB space using the 
euclidean distance as the clustering similarity metric. 
Some works [4] show that the tradeoff between, 
computational efficiency and visual quality justify this 
decision. In the works reported in this paper we will stick 
to this common practice. Color Quantization has 
application in visualization, color image segmentation, 
data compression and image retrieval [7]. In visualization 
and compression applications the typical size of the color 
palette (codebook, color representatives) is 256, whereas 
for segmentation and retrieval tasks the size of the color 
palette is smaller. If the number of color representatives 
is not set beforehand, the problem of discovering the 
"natural" number of representatives becomes much more 
involved, and it is a line of research of its own. In [8] we 
reported the application of steady state genetic algorithms 
to this kind of problems. 


In this paper, the emphasis is in performing near real¬ 
time adaptive clustering for a non-stationary population. 
Therefore, we will consider the palette size fixed for each 
experiment. Although sequences of images (video) lead 
naturally to the consideration of time varying clustering 
problems, the usual approaches consider time invariant 
distributions of either colors [9] or image blocks [lOJ, 
and apply conventional clustering methods. This may be 
due to the nature of the video sequences considered. Most 
of the works are applied to video recording of talking 
heads, which show little variation of color distribution. 
Our experimental work involves the color quantization of 
a sequence of images that show a smooth but clear 
variation over time of the distribution of colors. Our 
experimental image sequence is a subsample of a panning 
sequence of an indoor scene (looking around the 
laboratory). Some heuristical efforts [11,12J have been 
reported that try to cope with the time varying 
characteristics inherent to image sequences. Our approach 
is to state the problem as a time-varying clustering 
problem, and to propose an evolutive strategy as the 
adaptation mechanism. 

Evolutive strategies [13, 14, 15] have been developed 
mainly by Schwefel since the sixties. They belong to the 
broad class of algorithms inspired by natural selection: 
genetic algorithms, genetic programming, etc. Under this 
design philosophy (population based, fitness guided, 
genetic-like operators) a vast host of algorithms have 
been proposed and applied. The features most widely 
accepted as characteristic of evolutionary strategies arc: 
(1) vector real valued individuals, (2) the main genetic 
operator is mutation, (3) individuals contain local 
information for mutation so that adaptive strategies can 
be formulated to self-regulate the mutation operator. 
However, it is widely recognized [15] that a lot of hybrid 
algorithms can be defined, so that it is generally difficult 
to assign a definitive "label" for a particular algorithm. 
Nevertheless, we classify the algorithm employed here as 
an evolutive statrategy because it fits in the above 
characterization. Our work differs from previous attempts 
[16] to apply evolutive strategies to clustering problems 
in several technical points (definition of individuals, 
fitness, selection) and in a mayor philosophical point: we 
are looking for an adaptive strategy that can be applied in 
near real time, whereas, for most of the evolulive/genetic 
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literature, the aim is to outperform other clustering 
algorithms without any regard for time constraints. 


The paper is organized as follows. Section 2 introduces 
time varying clustering notation. Section 3 discusses the 
evolutive strategy used in the experimental work. Section 
4 presents the experimental results, and section 5 gives 
our conclussions and lines for further work. 


2 TIME VARYING CLUSTERING 

Cluster analysis and the related Vector Quantization 
design problem are important techniques in many 
engineering and scientific disciplines [5,17,18,19,20,21]. 
Color Quantization is Vector Quantization in the RGB 
unit cube. In their most usual formulation it is assumed 
that the underlying stochastic process is stationary and 
that a given set of sample vectors properly characterizes 
this process. This paper tries to address the case when the 
underlying stochastic process is assumed to be time 
dependent (such as it happens in general image 
sequences), and propose an evolutive strategy that can be 
of use. Another important assumption is that no 
knowledge of a model for the time variation of the 
population is known. If a model is known, a predictive 
approach [5] would reduce the problem to a stationary 
one. 


min 

{Y(0} 


ZSZlb(0-y/(0lf5!((0 

t^0j=l/=l 


8»(r) = 


i = argmin|||xy(0-y*(0lf * = 


otherwise 


A reasonable simplifying assumption is that the 
minimization of the sequence of time dependent error 
function can he done independently at each time step: 

min{E{t) 1 = 0,1,..} = | min £(0), min E(l),.. 

Under another reasonable assumption, that of smooth 
(bounded) variation of optimal set of representatives at 

succesive time steps (i.e. Ilyi(0“yi(^”0ll 
of representatives obtained after adaptation in a time step 
could be used as the initial conditions for the next time 
step. Smooth variation of color representatives can be 
assumed for image sequences if the frame rate is enough 
and color persistence is relatively high. In the 
experiments we will work with a time subsample, so the 
reader must keep in mind that smooth variation is an 
assumption that may not be satisfied. 

3 THE EVOLUTIVE STRATEGY 


A time variant formulation of the clustering problem 
must start with the explicit assumption of a time varying 
population described by an stochastic 

process{X, f = 0,1,..} (Note that we have jumped into 

the discrete time case). In this framework, a working 
definition of the time varying Clustering problem could 
read as follows: Given a sequence of sets of vectors 
K(r) = {xj(f),..,x„(r)} obtain a corresponding sequence of 
partitions of each of them into a sequence of sets of 
disjoint clusters K^(r)} that minimizes a 

criterium function = related time 

varying Vector Quantization can be stated as the search 
for a sequence of representatives Y(/) = {yj(/),..,y^(r)} 
that minimizes the error function (distortion) 
£ —Ptinctions C and E coincide when the 

criterium function is the within cluster variance and the 
error function is based in the euclidean distance. Color 
Quantization of image sequences obviously is a time 
v^ing Vector Quantization problem. The stochastic 
minimization problem that must be considered in order to 
derive adaptive algorithms can be stated as follows: 


The idea behind evolutive strategies is to mimic natural 
selection to solve difficult optimization problems. A 
population of feasible solutions is proposed. The best 
ones are selected (sometimes randomly) to built up a new 
proposition. New solutions (children) are built up from 
previous ones (parents) by the application of genetically 
inspired operators: recombination and mutation. A widely 
accepted pseudocode representation of the algorithm is as 
follows [14]: 

t:=0 

initialize P(t) 
evaluate P(t) 
while not terminate do 

F(t):= recombine P(t) 

F'(t):= mutate F(l) 
evaluate F'(t) 

P(t+1):= select (F'(t) U Q) 
t:= t+l 
end while 

The population P(t) is a set of solutions proposed at 
generation t. The algorithm iterates until some time or 
optimality condition is met. The evaluate operator 
computes the objetive function for each individual. The 
recombine operator finds mates and recombines them 
producing a set of offsprings P’(t). This operator is 
seldom defined in evolutive strategies, we have not 
defined it in our strategy. The mutate operator produces 
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new individuals by the application of random 
perturbations. Usually evolutive strategies define these 
random perturbations as samples of normally distributed 
random variables. The set Q can be either the set of 
parents or the empty, depending of the strategy. 

In order to define an evolutive strategy, the first decision 
is the appropriate definition of the population. The 
common approach is to define each individual as a whole 
solution. In the case of Clustering, each individual would 
correspond to a partition of the sample represented by the 
cluster centres. The fitness of the individual could be 
straightforwardly defined as the objective function. 
Individuals would compete to survive as the best 
solution. We have taken a radically different approach. 
We have defined each individual as a single cluster center, 
so that P(0 == {yi(0» * ^ • The local fitness of the 

individual is, then, its local distortion 

^•(0“X”=il|*;(0“yf(0|| 5,y(f).The solution proposed 

at time t is given by the entire population. The 
population as a whole can be evaluated to measure its 

fitness ^(0 = XLS;=i|h;(0-y/(0f5,y(/) which 
corresponds to the objective function to be minimized. 
The risky hypothesis is that the local optimization of 
individual cluster distortions will lead to the global 
optimization of the entire set of cluster centres. 

As said before, we have not defined any kind of 
recombination operator. The mutation operator follows 
the basic philosophy of evolutionary mutation operators, 
it is a random perturbation that follows a normal 
distribution. There are two design questions to answer at 
this point: (1) Which individuals will be mutated? and (2) 
How many mutations will be allowed?. The uniform 
random selection of individuals to be mutated does not 
seem to be reasonable. Therefore, we have decided to 
perform a guided selection of the individuals to be 
subjected to mutation. The guide is to obtain mutations 
from the individuals with the highest distortion. We have 
considered two possibilities: the selection of the 
individuals whose local distortion is greater than the 
mean of the local distortions in its generation S ^, and the 
selection of the individual with the highest local 
distortion S^. More formally: 

5'(0 = {'k/W ^ = {fc = argmax{FM}} 

As to the number of mutations we have decided to 
perform a fixed number of mutations in any case, so that 
the number of mutations per individual will depend on 
the selection strategy chosen. The mutation itself is 
performed adding to the selected individuals pseudorandom 
samples of a normal random 

variable: P'\t) = |a, = y,- + u,*u « A^^O, i/(0)|y,' e 

The estimation of the covariance matrix is based on the 
actual cluster elements assigned to the mutated individual. 


i,(0 = (« - ir'Xy=,(x;(0-y/(0)(x;(0-y,(0)'8,y W 

Finally, to define the selection of the next generation 
individuals we have followed the so called (X+^)-strategy. 
We pool together parents and children: 

P''(r)ue = {y„...y^.y^^.„..,y^^„} 

Where m is the number of individual generated by 
mutation. The fitness function used for selection of an 
indivual is the distortion when the sample is codified 
with the codebook given by more 

fonnally: 

The selection operator selects the c best individuals 
according to the above fitness. To define formally the 
selection operator, first consider the set: 

P"V )={y V”. y/„„ \h Pi (O} 

Then the specification of the selection operator is: 
P(f+l) = {y,.eP"'(r);i = l..c} This selection involves 

the fitness of the whole population with the addition of 
the mutatations generated. This makes the algorithm 
sensitive to the number of mutations generated, forcing 
the above mentioned restriction to a fixed number of 
them. 

The last critical decision in the design of the evolutive 
strategy is the mapping of the generation number into the 
frame number of the image sequence. The more 
"conventional" approach would consist of computing 
several generations for each frame. Given our desire to 
approach almost real time performance, we have also 
made a critical decision at this point. We have defined a 
one to one mapping. That is, for each image only one 
generation of the evolutive strategy is performed. In other 
words, t is the image number in the sequence. 

4 THE EXPERIMENT 

The sequence of images used for the experiment is a 
panning of the laboratory taken with an electronic Apple 
Quicktake camara. Original images have an spatial 
resolution of 480x640 pixels. Each two consecutive 
iinages overlap 50% of the scene. Figure 1 shows the 
distribution of the pixels in the RGB unit cube for some 
of the images in the sequence. This representation clearly 
demonstrates the time varying nature of the data. 


176 




Figure 1. Distribution of pixel colors of some of the 
images in the experimental sequence 

Figures 2 to 5 show the results of the application of the 
evolutive strategy to random samples of 1600 pixels of 
each image. As a reference algorithm we have used a 
variation of the algorithm proposed by Heckbert [1] as 
implemented in MATLAB. This algorithm recursively 
partitions the RGB unit cube along the axis of maximum 
variance. The partition is performed by an orthogonal 
plane chosen so as to minimize the sum of the residual 
variances [22]. Color representatives are computed as the 
center of mass of the resulting partition cubes. This 
algorithm has been applied in two ways. First it has been 
applied independently to each image (point line). Second, 
the set of color representatives obtained with the Heckbert 
algorithm for the first image has been used to color 
quantize the entire sequence (dashed line). The difference 
between both lines gives another view of the time 
varying nature of the data. As the goal is to show the 
adaptive properties of the proposed evolutive strategy, we 
have used as initial cluster centres the first Heckbert 
palette. The solid line in the figures shows the mean 
result of 30 replications of the evolutive strategy along 
with the 95% confidence intervals. 


5 CONCLUSIONS AND FURTHER WORK 

We have proposed an evolutive strategy for the adaptive 
computation of color representatives for Color 
Quantization that can be very efficiently implemented and 


reach almost real time performance for highly variable 
color populations. Experimental works show a good 
response for a realistic sequence of images. Further work 
will be address to define more robust mutation operators 
that could reduce the variance of the resuts. Particularly, 
deterministic mutation operators seem to be desirable. 



Figure 2: Results of the evolutive strategy with c=16, 
m=16, and selection of the mutated individuals. 



Figure 3: Results of the evolutive strategy with c=16, 
m=16, and selection of the mutated individuals. 



Figure 4: Results of the evolutive strategy with c=256, 
m=:128, and selection of the mutated individuals. 
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ABSTRACT 

In Ulis work a new inetliod to design filter banks with 
rational decimation factors is proposed. It aims at 
the cancellation of the main component of aliasing in 
the output signal; this imposes a set of conditions on 
tlie filters of the nimlysis/synthesis banks. If cosine- 
modulation of different linear phase prototypes is used, 
the aliasing cancellation condition constrains the pro¬ 
totypes relative to adjacent branches to become depen¬ 
dent on each other. A procedure to design the proto¬ 
types based on these constraints is proposed and exam¬ 
ples of cosine-modulated non-uniform filter banks are 
presented. 


1. INTRODUCTION 

Splitting the spectrum of a digital signal can be useful 
in several applications, for example data compression. 
Most of the literature in the field of subband coding fil¬ 
ter banks design is concerned with uniform width sub¬ 
bands. However, in some cases a non uniform splitting 
is more suitable, for example in audio coding [1], where 
non-uniform width subbands could match better the 
critical bands of the human auditory system. 

The problem of designing non-uniform filter banks 
has been addressed, for example, in [2]-[5]. In this work 
filter banks with rational decimation factors are consid¬ 
ered, so extending the work done in [6] related only to 
integer decimation factors. 

The method can be considered a Near Perfect Re¬ 
construction (Near-PR) one since it is based on the 
cancellation of the main component of aliasing, like in 
Pseudo-QMF banks [7]. In the case of rational decima¬ 
tion factors l)anks, however, more than one coupling of 
tha aliiuung componnntH ofmljncent branchcH that lead 
to their cancellation is possible. If cosine modulation is 
used, the aliasing cancellation constraints involve the 
prototype filters of each branch. A design procedure 
is proposed and numerical examples are presented to 
show the effectiveness of the method. 


x(n) 






X{n) 


Figure 1: Non-uniform bank with rational decimation 
factors 


2. ALIASING CANCELLATION IN FILTER 
BANKS WITH RATIONAL SAMPLING 
FACTORS 


Consider the system in Fig. 1, where a non-uniform 
bank having rational sampling factors Rm/Mmy m = 
0, ..., M-1, is depicted. The input-output relationship 
in the z-domain is given by: 




= E 


1 1 


fim-l 


m=0 ^ ^ 




^ 1 1 at nt — I — 1 

m=o 1=1 p=o 

•H,.H'tj.VlzWi".-) (1) 


where Wm = Eq. (1) highlights the re¬ 

construction transfer function and the aliasing compo- 
nents. 

Consider the analysis stage of each branch shown in 
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Fig. L Real coefficients filters are taken into account 
and, therefore, the frequency response of each filter \iaa 
passbands located at positive and negative frequencies, 
symmetrically with respect to the origin. The passband 
at positive (negative) frequencies has width n/Mm and 
is centered in {km + 0.5)7r/Mm (-(^m + 0.5)7r/M^). 
The value km is an integer and selects which part of the 
spectrum of the Rm-fo\d upsampled input signal must 
be extracted. For example, to extract the spectrum in 
the frequency interval [7r/5,37r/5] ([“37r/5 ,-tt/S]) we 
use Mm^^ and A:ni='2. If we consider the fre¬ 

quency response of each filter of the analysis/synthesis 
banks approximately equal to zero in their stopbands, 
then the filters transfer functions can be expressed as: 

Hm{z):^Um{z) + Vm{z) ( 2 ) 

Fm{z)^Um{z)d-Vm{z) (3) 

where Um{^) and have a passband for w > 0, 

while Vm{^) and Vm{^) have a passband for u; < 0. 

Due to the M^-fold upsampler in the synthesis stage, 
images of the m-th subband spectrum are filtered by 
Fm{z). The main aliasing terms are created at the 
high-frequency and at the low-frequency edges of the 
passband of Fm{z). These components have been de¬ 
scribed for a cosine-modulated uniform bank in [7], If 
we consider that in a rational decimation factors bank 
each branch operates on an Rm-fo\d upsampled version 
of x(n) and we retain only the more relevant terms, 
then the aliasing terms can be written as: 

= 

[Urniz)V,n{zW^MZ)X{z^-W^^Z!'"') + 
+Vm{z)UmizWM'l”')X{z^"' ( 4 ) 

In [7] it is shown that for uniform cosine-modulated fil¬ 
ter banks the component Am'^^\z) of the m-th branch 
is canceled by the component of the (m+1)- 

th branch. In the non-uniform case, we have to con¬ 
sider that the cancellation may occur also by coupling 
the (high)-(high), (low)-(low) or (low)-(high) aliasing 
terms coming from the m-th and the (m^l)-th branch, 
i.e., the following cases must be taken into account: 

a) I + A<-::%^\z) i = 0 


b) I ftm+i = 0 

c) Ai^^\z) ; + aI;1^^\z) I Rm+l = 0 

d) A^Jl'^''\z) 4- I Rrr,+l = 0 

where Q{z) I M stands for the z-transform of the M- 
fold subsampled version of g(n). For example, consider 
the bank {1/5, 3/5, 1/5} that can be implemented us¬ 
ing filters having a passband equal to 7r/5 and cen- 
t(!r(ul, on the ixmitive IVeciuency axis, in 7r/l(), 7r/2 and 
Ott/IO. The aliasing term produced in the 

m=:0 branch at the synthesis stage must be canceled 
by a[^^^^\z) I 3. 

Consider the (high)-(high) c 2 Lse: substituting (5) 
into case a) equation yields an expression that can be 

split into two systems: if 

then the following must be verified 

+ivdTrl^">+> ^ ° 

( 6 ) 

otherwise, if then the 

following must be verified 

' + 4 Rm + 

(7) 

Similar equations can be written also for the other 
possible couplings of aliasing components. 

3. COSINE-MODULATED NON-UNIFORM 
BANKS 

The use of cosine modulation simplifies the fulfillment 
of the aliasing cancellation condition. Suppose that 
each filter of the analysis/synthesis banks is obtained 
as follows: 

hm(n)= 2(?m(n)cos((2A;rn + I) 2 jfc'(^ “ ^" 2 " ' ) d-^m) 
/m(n) = 2<;m(n)cos((2fcm + 

= hni{Nm 1 n) 

( 8 ) 

for m = 0,1,..., M - 1. Nm is the length of gmi'>^)’ 
The prototypes </,n(7^) have a linear phase and satisfy 
= gmi^rn — 1 — n). The phase terms 6m are 
chosen to satisfy the aliasing cancellation constraints. 
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In the case 

of cosine-modulated banks, the following 

relationships hold: 

Umiz) = 

)yy2M„, 

.gj^m 

Vm{z) = 

/,W-(^m + (l/2))sw-(fcm + (l/2))(/^m-l)/2. 

)yy2Mm 


(Jm{z) = 

r /,w(fem + (l/2))xT^(fcm + (l/2))(N,n-l)/2^ 


.g-J^m 

Vm{z) = 

n W'*(^m + (l/2))^W-(fcm + (l/2))(Wnn-l)/2 

Gm(«VK2Mm )’^2M,„ 



(9) 

Consider, for example, the (high)-(high) case. Sub¬ 
stituting the above expressions into the aliasing can¬ 
cellation constraints (6) and (7) yields a relationship 
between the prototypes of adjacent branches. In [8] it 

is shown that for 

choice = 0 allows aliasing cancella¬ 

tion if the following relationship holds (the same result 
is obtained considering the cases b)-d), but with a dif¬ 
ferent relationship between the phase terms Om)' 


ETC £ ""('■"’“"'S-'' 


p=0 


pssO 

•G„+, ) (10) 


Moreover, it is possible to demonstrate the following 
facts [8], which outline also the steps of the procedure 
to design rational sampling factors filter banks. 

By using the zero-phase representation of the pro¬ 
totypes, i.e., 


= (11) 

and by imposing the condition 

Nm ^ ~~~ ^ 

Rm 

on the lengths of the prototypes, the constraint of alias¬ 
ing cancellation reduces to the following relationship 
between the zero-phase frequency responses of the pro¬ 
totype filters 


^RJ ^ ^Rm MJ 

‘ ‘ -CSV “ 


A^rn+l Rm-{-l 


nRm 


2MinRm-\-\ 
n 


+ 


2Mm+l 


)‘ 


/2ni+l 2A^mf^m+l 2A/m+l 


) 


(13) 


Let cJc.m = 7r/{2Mm) be the cut-off frequency of 
Gm{^)- Suppose the transition band, having width 
AcJrnj is centered in LiJc,m l^t cjp,m == 
and 01,,m = Wc.m + (Awm/2) be the upper bound of 
the passband and the lower bound of the stopband, re¬ 
spectively, of Gm(t^)- Therefore, the constraint (13) is 

satisfied if Rm^^m = -Rm+iAo;m+i and if G^+i(t*^) is 
chosen as follows 


= < 


0 — TT < Oi < —W,,m+I 

G'^'’>(-u^P.m -h (a; -|-u.p,„+,)- 

W,,m+X + l ' 

-“Ol,,m+l < 01 < —Oip,m+l 
< 01 < Olp,m+l 

Gi^'’’(a;p,m -b (u> - Wp,m+l)' 

01p,m+l < 01 < 01,,m+l 
0 01,,m+l < 01 < TT 


(14) 


Assuming the aliasing components have been com¬ 
pletely eliminated, the input-output relationship shown 
in (1) can be expressed by 

X{z) = nz)Xiz) 

(15) 

Phase error is absent if the synthesis filters are a time 
reversed version of the analysis filters, while the magni¬ 
tude error is maintained at low levels if T{z) is approx¬ 
imately allpass. The reconstruction error is reduced 
choosing prototype filters with high stopband attenua¬ 
tion and also with a proper behavior in the transition 
band. 

A first prototype is designed (by using known tech¬ 
niques, for example those shown in [9][10][5]). This pro¬ 
totype is relative to the xti-th branch, where tti must be 

chosen so that k= 0 .M-1 }. 

Its cut-off frequency is Wc,m = is 

the gain in the passband. Gm(w) must have a power 
complementary transition band, i.e., satisfies 

I Gm(w) P + 1 Gm(]^ 

for Wp,m < w < w,,m 

The prototypes in the other branches are obtained 
by using (14) and by adding the correct linear phase 
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term to determine 5fm+i(^) is obtained by 

the inversion of Gm+i(t^)- 

It can be shown that prototypes designed by using 
this procedure make T{z) approximately allpciss. 

4. EXPERIMENTAL RESULTS 

To show the effectiveness of the design procedure de¬ 
scribed in the previous section we consider three ex¬ 
amples of non-uniform banks. Example 1 and 2 are 
relative to banks with rational sampling factors, while 
Example 3 refers to an integer sampling factors bank, 
suitable for audio coding applications, that has been 
proposed in [11]. We indicate with K and 0 the sets { 
km, m=0,...,M-l } and { 6m, m=0,...,M-l }, respec¬ 
tively. 

Example 1: Bank { 1/5, 3/5, 1/5 Two proto¬ 
types need to be designed (^ 0 ( 7 ^) = 92 {'^))', Ihe cou¬ 
plings of aliasing components that must be considered 
are (high)-(high) and between the branches 

O'l and E2, respectively; K={ 0, 2, 4}; 0 = { 7r/4,7r/4, 
fl-/4 }. 

Example 2: Bank { 2/7, 2/7, 2/7, 1/7}. Two pro¬ 
totypes have to be designed (^o(?0 = = £^ 2 (^^))- 

In this example more than one choice is possible for K. 
We will use K={ 0, 5, 4, 6 } to show the largest vari¬ 
ety of couplings of aliasing components {(high)-(high)^ 
(low)-(high)^ (low)-(low)^ in the order). In this case 0 
= { 7r/4, 7r/4, —7r/4, —7r/4 }. 

Example 3: Non-uniform bank having 16, 32 and 64 
as possible decimation factors and allowing the split¬ 
ting of an audio signal sampled at 48 kHz as shown in 
Fig. 2. 


♦ 4 4 

4.B kHz to kHz 10.e kHz 


can be written as 


^l,m {^) — 


( 17 ) 


with m=0,...,M-l, l=Q^...yMm — 1- The functions 
Ai,m{^) are 27r-periodic functions. All the aliasing terms 
that refer to the same shifted version of X(z), 
i.e., having the same value of must be summed 

up, so that the following aliasing error can be defined: 


Mmox-l M-1 Mm-l 


Ea{0j) 


1 


E lE E AmMP 

r=l m=0 l—l,(lRfn.)n\o6Mm=r 


(18) 

where Mmax = max{M,„, m=0,.. .,M-1} and where the 
inner summation in (18) is evaluated only for the values 
of / and m satisfying the condition (IRm) niod Mm = r. 
Therefore 


Ep-p = niax I T{u>) \ - min | T{w) \ (19) 

0<a;<jr 

hja^rnax “ (^(1) 

0<w<n- 

can be used as measures of the quality of the designed 
banks. 

Tables 1 and 2 report the results obtained for Ex¬ 
ample 1 and 2, respectively, for different lengths of the 
prototypes. As can be seen, both the magnitude dis¬ 
tortion and the aliasing error are kept small 

In Fig. 3 the frequency responses of the final cosine 
modulated analysis filters relative to Example 2 and 
obtained with prototypes having 82 and 163 coefficients 
are shown: from the inspection of this figure it can be 
seen that the design based on (14) does not degrade 
the passband and the stopband characteristics of the 
new prototypes. 

In Fig. 4 the final bank relative to Example 3 
and obtained with filter lengths equal to 512 is shown: 
the reconstruction and the aliasing error are Ep-p = 
3.88E' — 03 and Ea^max = S.99E — 03, respectively. 


Figure 2: Subband splitting relative to Example 3 

The performance of the presented design method is 
evaluated in terms of both the overall distortion func¬ 
tion T{oj) and the residual aliasing error. As to the lat¬ 
ter error, a global measure relative to the whole struc¬ 
ture is used in this work. According to the input-output 
relationship in (1), the aliasing contribution relative to 


5. CONCLUSIONS 

In this work a method to design non-uniform filter 
banks with rational sampling factors has been presented. 
Aliasing cancellation constraints have been applied to 
cosine-modulated banks. A simple procedure, tliat re¬ 
quires numerical optimization of only one prototype, 
being the others derived in a straightforward way from 
this one, has been proposed. 
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'Pable 1: Results relative to Example 1 


iauxc a.. 

No,N2 

Ni 

Ep^p 

Ea.max 

36 

106 

3.42 E-03 

1.82 E-02 

46 

136 

4.33 E-03 

3.79 E-03 

56 1 

166 

1.52 E-03 1 

2.93 E-04 


Table 2: Results relative to Example 2 


No, Ni, N 2 

Na 

Ep-p 

Ea.max 

67 

34 

6.67 E-02 

3.00 E-02 

83 

42 

3.06 E-02 1 

1.20 E-02 

123 

62 

6.45 E-03 

4.36 E-03 

l63 

82 

4.04 E-03 

9.15 E-04 



Figure 3: Cosine-modulated bank relative to Example 

2 (No = Ni = N2 = 163, N 3 = 82) 



Figure 4: Filter bank relative to Example 3 (N-512) 
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ABSTRACT 

A mathematical scheme for simultaneous signal com¬ 
pression and noise reduction is presented in this con¬ 
tribution. Initially the use of well-localized wavelet is 
proposed as derived from the general theory of frames 
[1, 2, 3, 4], in order to generate a representation subs¬ 
pace capable of reproducing the original signal while 
excluding the additive noise. 

The representation subspace however, is shown to be 
efficient for noise reduction but in its initial form cre¬ 
ates an ill-conditioned inverse problem. This is rela¬ 
ted to the norm of the wavelet expansion coefficients 
which may be very large in magnitude. In our treat¬ 
ment we show clearly that the ill-conditioned problem 
can be avoided simply by adopting an orthonormal re¬ 
presentation for the wavelet-generated subspace. The 
mathematical framework of our approach allows us to 
develop a method to construct explicitly the orthonor¬ 
mal representation in a natural way. The new repre¬ 
sentation preserves the signal norm and improves the 
compactness of the subspace with respect to its com¬ 
pression properties. 


1. FRAMEWORK 

To represent a signal, /(t), we fix a subspace, S, by 
choosing a finite set of wavelet functions as those pro¬ 
pose in [4], pp 79, (see Fig 1), and expand the signal 
as: 

i (1) 

where 

^Af = fm G Z ; mi < m < m2 ; M = lm2 + l| -mi} 

( 2 ) 


Zn = {n g ^ ; ni < n < na ; iV = \n 2 + 1| — fii}. 

(3) 

The transformation (1) has no inverse in a formal sense. 
However, a solution of minimum norm may be found 
as: 

MN . 

lfe€^Mi€ZNl=l,A,5«0 ' 

(4) 

where the vectors jV'/) satisfy the eigenvalue equations 

[ 6 ]: 

i = 1,..., MiV ; m G Zm ; n G Zisr 

The inner products and {m,n|V>i) are per¬ 

formed in L^([0, T]) and in the space of square summst- 
ble sequencies respectively, with (m,n|A;, j) 

In order to gain accuracy in the determination of the 
eigenvalues to be considered as different from zero, ins¬ 
tead of solving (6) we calculate directly the singular 
values Pi = and singular vectors |V^i) of the ma¬ 
trix ; m G ; n G Ziv ; i , Nt, where 

Nt is the number of samples that are taken in discre¬ 
tizing the [0,r] interval. Unfortunately, as it can be 
seen from (4), when the spectrum \i ]l = l,...,M/7 
has fast decay rate, the coefficients Cm,n are of large 
magnitude and hence the representation (1) becomes 
“non-economical”. This problem can be easily over¬ 
come by noticing that the eigenvectors |^/), which be¬ 
long to the square sumrnable sequences, can be trans¬ 
formed to provide an orthogonal set of functions 
that span the wavelet-generated subspace. These are 
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calculated in the form [6]: 

= V 53 (6) 

' fce^M ie^N 

Through this new set of functions, we have an alterna¬ 
tive representation for the signal as: 

MN 

fit) = 53 c^) 

(=l,A,yiO 

where 

= ^ 53 53 ( 8 ) 

Since the transformation (7) is unitary, the norm of the 
coefficients Cj is equal to the signal norm and this is a 
more economical representation. 

It can be shown that, for a signal outside the subspace 
S, both, representation (1) in terms of wavelets and the 
orthogonal representation (7), are identical approxima¬ 
tions for such a signal [6]. 

Let us denote by f*{t) the signal when it is corrupted 
by zero mean random noise of variance In order 
to reduce the noise effect, we seek an approximation 
of /*(t) in S within a degree of precision that takes 
into account the assumed known variance of the noise. 
This precision has to be used to fix the dimension of 
the representation subspace S. Notice that, according 
to definitions (2) and (3), the subspace is fixed by given 
the numbers mi,m 2 ,ni,n 2 . FVom these four numbers, 
only mi has to be precise, since the functions 
become sharper as m decreases and hence large ne¬ 
gative value of m render the functions susceptible to 
reproducing random noise. The upper bound m 2 may 
be overestimated without causing undue effects. The 
bounds m and 712 for Zjv, are merely estimated so that 
the considered time interval is covered by the support 
of the functions involved. The crucial value of mi, is 
proposed to be fixed as the maximum value for which 
the following is satisfied: 

mrit)-m\n<<T\ ( 9 ) 

denotes the mean value operation, f*(t) is the 
noisy signal and / (t) corresponds to the equivalent ap¬ 
proximations (1) or (7). 


2. NUMERICAL TEST 

Fig 2 shows 500 samples of noisy data which are simu¬ 
lated by adding Gaussian noise of variance cr^ = 0.2 


to the original signal represented as a continuous curve 
in the same figure. In Fig 3, the continuous line re¬ 
presents the original clean signal for comparison and 
the dotted line represents the reconstruction obtained 
through both, the wavelet (1) and orthonormal repre¬ 
sentation (7). 

Fig.4 shows the coefficients of the wavelet representa¬ 
tion, notice that these are very large. On the other 
hand, the triangles in Fig.5 correspond to the values of 
the coefficients of the ortlionormal expansion, and this 
is clearly a more economical representation. 

Fig 6 shows 600 samples of noisy data which are simu¬ 
lated by adding Gaussian noise of variance cr^ = 0.4 
to a sinusoid whose phase changes randomly at t = 
0.25,0.5,0.75. The continuous curve of Fig 7 represents 
the original clean signal for comparison and the dot¬ 
ted line represents the reconstruction obtained through 
both, the wavelet (1) and orthonormal representation 

(7). 

Fig 8 shows the coefficients of the wavelet represen¬ 
tation for this case, and they clearly have the same 
feature as above, namely, that they are very large num¬ 
bers. A more reasonable set of coefficients is obtained 
through the orthonormal representation as shown in 
Fig 9. 

3. CONCLUSIONS 

In the scheme for data compression and noise reduction 
we have presented, the used of frame-based wavelets is 
proposed as a starting point. However, the fact that 
we deal with a finite subset of frame elements, and that 
we build the dual vectors in this subspace, implies that 
the inversion becomes an ill-conditioned problem whe¬ 
reby the norm of the coefficients, used to represent the 
signal, is very largo. In out treatment the problem is 
avoided simply by adopting an orthonormal represen¬ 
tation for the wavelet-generated subspace. In addition 
to preserving the signal norm, the proposed orthogonal 
representation eliminates redundancy improving com¬ 
pression performance. 
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Figure 1: Mother Wavelet 4>{t) 



Figure 2: Input noisy data 



Figure 3: The continuous curve plots the original sig¬ 
nal. The dotted line plots the reconstruction obtained 
through both, the wavelet and orthonormal represen¬ 
tations 



Figure 4: The vertical axis shows the wavelet represen¬ 
tation coefficients Cm,n 



Figure 5; Orthogonal representation coefficients C) 


186 








Figure 6: Input noisy data. 



Figure 7: The continuous curve plots the original sig¬ 
nal. The dotted line plots the reconstruction obtained 
through both, the wavelet and orthonormal represen¬ 
tations. 



Figure 9: Orthogonal representation coefficients ci 


1e+08 



Figure 8: The vertical axis shows the wavelet represen¬ 
tation coefficients Cm.n 
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ABSTRACT 

This article derives theoretical results of the addition 
of a regularization term to the phase diversity algorithm. Also 
derived is the image estimate using the regularization term 
which accompanies the phase diversity algonthm. Phase 
diversity and its advantages are described. The phase 
diversity method is outlined and theoretically developed to 
show the reader how it is implemented using an error metnc 
and nonlinear optimization methods. As the theoretical 
development is performed the image reconstruction from the 
phase diversity image estimate is discu.ssed. The phase 
diversity image estimate is shown to be mathematically ill- 
posed, tlius, the idea of regularization is introduced and 
developed further. 


1.0 INTRODUCTION 

The Air Force Phillips Laboratory has been 
interested in advanced concepts and devices for opt^^* 
wavefront phase sensing and detection for many years. The 
phase diversity techni(|ue extracts the wavefront pha-sc from 
two simultaneous images. IVpit^ally one image is die best 
fiKused image and the .second image has a known, induced 
defocus aberration. l‘'rom these two collected images, die 
optical system can be fully characterized. 

Phase diversity techniques hold immense 
advantages over conventional optical wavefront phase 
detection. The first advantage is that this method is scene 
insensitive. That is, die images do not liave to be that ot a 
point source, but can be of any type of image. Anodier 
advantage is its simplicity in implementation. Bodi the m 
fiKus and die out of focus images can even be collected on the 
same camera if desired. 'Dius, this simple configuration 
requires less inaintenance. An additional advantage is that 
phase diversity as a wavefront sensor can detect a piston error 
between two adjacent telescopes, or two adjacent segments of 
a single tele.scope. Thus, being usetul in phased luray 
telescopes, segmented telescopes and interlerometer designs. 
Other wavefront seinsors such as the Shack-llarlman or 


hearing interferometer wavefront sensors are not able 
letect piston direedy. Phase diversity also ® 

jxtemal. common reference, that of the image, which inakes 
he techniques more robust, and less 
jystemaUc errors induced by opdeal hardware. yj * 
technique uses the same photons to form the ^ ^ 
iberradon detecdon. This could be advantageous to .sphtong 
the valuable photons from the image to a separate wavefr^ 
sensor to perform the aberradon detection as convendonal 
wavefront sensors do. 

Phase diversity can be divided into two distinct 
categories. This first category u.ses nonlinear opdmizaiion 

2* » minimise a selecad er,» meuic Th,a 

metric utilizes the in focus image and the out of focus imag 

in its search to find the optimal optical 

(OTF) which best minimizes the error memc. ^he 

Legory utilizes the Transport Equation 

calculate the optical phase of the wavefront. The following 

chart summarizes the different methods within these two 

categories. The method which was cho.sen for study in di. 

article was the error metric category in which the error metric 

is defined as the Gonsalves error metric. 

Thus, phase diversity solves the problem of pliase 
infonnation retrieval from modulus data only, with ihe 
hypothesis that an additional modulus measurement with a 
induced diverse phase term suffices to solve the .set ot 
equations uniquely. In our case the induced diverse tenn is 



Transport Eq. 

Error metrics 

i 

-1 

"IT 

Differential 
Eq. Solvers 

“T" 

Neural 

Networks 

Mesh 

Methods 

Fast 

Gradient 

Methods 

Parallel 

Mediods 

y 

WavelengUi 

Diversity 

- 1 

Matrix 

Methods 

1 

FFT 

Methods 

Standard 

Iterative 

Methods 

i 

Piston 

Segmentation 

Neural 

Networks 

y 

Lookup 

Thble 

Metric 


Methods 

Figure(l) Flow Chart showing different methods. 
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a defocus terni, although the algoriUnns are not » 

focus as the diverse tenn. Therefore, the first ^ 

best focused image which can be collected on a detec or 
^ne. The second image is Uiken on the same or stmil^ 
detector plane with a defocused image. Thus, an allenmte 
viewpoint is Uiat phase diversity can be seen as tak g 
advantage of the 3-dimcnsional characteristics of the 
diffraction field within an optical system. 


2.0 IMAGE RECONSTRUCTION USING 

GONSALVES ERROR METRIC 

me framework for tlie development of tlie 
mn(hcn.»acal model refute Wmpllon, " 
be comlmions, linear, and shltl mvarianl. The imaging 
uacd here la ahown In rignm(2). All v,ar,ablea .allow 
in Figurc(2) are in die spatial domain, where (>(x,y) t- 
original object and can be seen to go in two directions. Tlie 
firs* direction is to the in focus optical system 
to the out of focus, diverse optical system. Now, i(x,y) . 
cedved image from the in focus optical sy.stem, similiu-ly 
7/x y) is the diverse image from die out of focu.s opti^l 
system. Next, lt(x.y) is the point spr«»^ runction (PS^ fw 

the in focus optical system and/ig/xyj isdie^nn .p < 

Snetion of the out of focus optical system. For this Par“cul^ 
M die box in Figure(2) labeled h(x,y) is a mode that 
represents both the opdeal imaging system, and the 
propagation medium, under the aforementioned assumptions. 


H(u, V) = ctM, V) w c(u, v; (3) 

Where C(u,v) is referred to as die complex pupil function 
and is defined as die following equation. 

C(M, V) = A(i/. (4) 

'Ihc A(u,v) is defined as the pupil function of ttic optical 
system and (|)fH.vj can be expanded in a scries as follows. 

<!>(«> v) = 

<!>(«'’)= V tt]+ct2co.s0+a3.sin0+ 0^/? +... + func{R, 0 ) 

n = t ■ (5) 

'Ihc (t>„ is the basis function used in optics to 
describe the various aberrations and it is composed of die 
discretized Zemike polynomials. Tlie Zcniike coefficients 
are die (X„ in front of each polynomial function, where ai is 
piston coefficient, a 2 and as are x-tilt and y-tilt coefficients, 
(14 is focus coefficients, ete. 'Ihe following detinitions are 
u.scd for R and 0. 


and, ((i) 

'Ihe following are die corresponding companion c(|Uittioits 
for die diverse system. 

= o{x,y)*hj{x,y) (7) 

I/u, v) = 0(H, v)Wj((i, v) (8) 

Hj[u, V) = Q(h, v) ® Q(«. V) (9) 

Where the complex pupil funedon for the diverse system is 
the following equation. 


The input output rctationshi^ for the in focus optical 
.system is the following equation -. 

i{x,y) = o(.x,y)*Hx,y) 

The equation in the Fourier domain is die following. 

/(H, V) = 0(H. V)H{U, v) (2) 

Where l(u,v) is the image spectra; 0 (u,v) 

^ctra: and H(u,v) is the Opdeal Transfer FuncUon (OTF). 
The OTF. is defined as the following autocorrelation. 



Notice diat the complex pupil function for Uic diverse optical 
system has the known dcfocus tenn added to die exponential 
and is designated as AECir.vj. 

Starting with the two basic equations in the frequency 
domain for the in focus and diverse optical systems. 

/(K, V) = <7(k, v)ff(M, V) (11) 

O = f7(M. V)Wrf(H, v) 

Both /(«,v) and are computed from die 

measured image data. H(u,v), H,^u,v) and 0 (h,v) are 
unknowns. Setting up an error metric between the measured 
and diverse images to get the following 

cquiillotii 2 

E. 2X11(11. V)-0(|<, v)//((f, »■)! +... 

v) - 0{u, v)|^ 

Taking the derivadve of Ihe above error metric E 
with respect to the object, O(u.v) and set equal to zero, dicn 
solve for O(u.v). After some algebraic manipulations 0(u,v) 
is found as a funedon of H(u,v), H,i(u,v), I(u,v) and t,i(u,v). 
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0(m, v) = 


h\u, v)/(m, v) + v)IjU‘, v) 


|H(«. v)l^ + l//j(«,v)|^ 


(14) 


Substiluling this equation into the above error metric with 
appropriate algebraic manipulations yields the following 
error metric equation. ^ 

^ V) - *'^1 

U V v)| +|/^j(», v)| ,|C\ 


We have replaced the OTF’s, H and Hj by A and Hj to 
indicate that these are to be estimated values. Remarkably, 
tliis expression is independent of the original object 0(m,v) 
and was first derived by Gonsalves^’^ Throughout the rest 
of this report this equation is the only error metric used and 
is referred to as the Gonsalves error metric. 


Given the above error metric, a possible sfiategy is 
to minimize the Gonsalves error metric, by the estimafion of 
the coefficients of the OTF. The minimization can be 
accomplished by using nonlinear optimizafion techniques, 
such as conjugate gradient or simplex algorithms. 'ITic 
conjugate gradient is suggested as the nonlinear optimization 
algorithm since it is well understood, is fast, and its memory 
requirements are workable. In addition there are many 
conjugate gradient routines which are available in various 
scientific software-jiackages. The conjugate gradient 
suggested for this research is from the IMSL libraries and 
uses finite difference methods to calculate the gradient. 


From this estimated OTF (i.e. A(«, v) ) the in focus 
image was used in an inverse filter calculation derived from 
Equation(2). The calculated Fourier transfonn of the object 


is given by, 


, V /(«, V) 

v(">= IT ,—( 

H(ii, v) 


(16) 


The inverse filter suffers from being matliemalically 
ill-posed and numerically instable. The problem arises when 
there are zeros in the OIT, which will cause singularities in 
the calculation of the object estimate, The 

mathematically ill-posed condition can be .solved by the use 
of regularization®. The parametric Wiener filter is an 
optimized least-square filter which essentially has a 
regularization term in the denominator to pcrlorm the image 
recovery®. The calculated Fourier transfonn of the object is 
given using the paramefilc Wiener/ilter below. 

^, X ^(». v) / («■ V) 

*'1 T rP^(l/, v)-] 


where P()(u,v) is the power spectrum of the object, P^u,v) 
is the power spectrum of the noise and tlie p, is a user 
selectable parameter. In actual reconstruction of data, the 
researcher does not have tlie object from which to get the 
power spectrum, (the object is what we are seeking). 


Therefore, for many applications the Signal to Noise (SNR) 
of the image is used in replacement of the power spectra. 

f{( u,v) I{u,v) 

(18) 




1^(“’*')! P[5ArR,(M, v)] 


lSNR,(u. v)J 

These two estimates of the object can be calculated on 
every iteration of the phase diversity calculation or can be 
calculated at the end of the iteration. For mo.st applications 
the above estimates of the OTF are just calculated at the 
end when the results of the phase diversity have re.sulted in 
the most opfimum esfimate of the OTF. However, these 
calculations can be made at each iteration if there is a 
reason to believe tliat the calculation of the estimated OTF 
can be incorporated into the iteration of the phase diversity 
optimization and will result in a more favorable results. 
More favorable results can be either a better estimate of the 
OTF or faster convergence or similar criterion. 


The above equations (Equation(16), (17) and (18)) 
are well undenstood and well documented. The authors 
choo.se to concentrate on Equation(14), rewritten below and 
refened to as the Gonsalves object estimate. 

W* (»,v)/(u,v) + tf>,v)/rf(», V) 


One of the first characteristics which is ob.served 
in Equation(19) is the similarity to the Wiener filter, 
Equation(17) and EquaUon(18). In fact, Equation(19) also 
suffers from being numerically ill-po.sed, the same 
drawback which is found in the inverse filter, 
Equation(16). The Wiener filter solved the ill-posed nature 
of the inverse problem by its addition of the extra term in 
the denominator. Thus, a possible solution which at times 
has been proposed was the addition of a similar Wiener 
type term in the denominator®’®. Thus, rewriting 
Equation(19). 


'gons' 


(h, v) 


//*(«, v)/(i«. v) + Hjju, v)Ij(.u, v) 

l«(«. vf + vf + 


( 20 ) 


However, this equation was ad-hoc and there was 
not any analytical derivation or mathematical justification 
for the addition of the extra term in the denominator. Also, 
there is no indication of the effect of this addition term 
would have on the actual error metric which is being 
minimized. Tlie next section will addre.ss tliese 
shortcomings. 


3.0 REGULARIZATION FOR THE GONSALVES 
ERROR METRIC 


190 



Tliis section shows a mathematically tractable 
derivation using the method of regularization to solve for 
the Gonsalves error metric and Uie Gonsalves object 
estimate. The metliod used here is similar the Wiener filter 

development in the text by Andrews and Ilunt^*. Using the 
methods of Lagrange multipliers we can add the 
regularization term to the original Gonsalves error metric in 
Equa(ion(13) to get the following new metric. 

E = '’) -0(«- v)H{u, v)|^ + ... 

U V 

+|/d(«. v) - 0 (h, v)Hj{it, v)|^ + PlCO(M, v)|^ ^ 2 ]) 

Where p is the Lagrange multiplier value and Q is a 
weighting matrix the same dimension as the spectrum of the 
object. The matrix Q could be a smoothing matrix or a 
matrix associated with SNR. Rewriting Equation(21) and 
dropping the .spatial frequency variables (u,v). 

^ - OH){I-OH) + + ... 

U V 

+ HQOKQO)* (22) 

Now taking the derivative with rc.spcct to the object O. 

+ + ... 

+ p[^(eO)](eo*) + P(GO)[^((20*)] (23) 

After some algebraic manipulations the following equations 
result. 

^ = '^'Z^RE[H(HO-n‘' + Hj(HjO-l/ + fiQ{QOn 

“ ^ (24) 

Taking the conjugate of tlie above equation. 

% = + + (25) 

u y 

Expanding the above equation. 

% = - «*/-«/</ (26) 

U V 

Now as before, the above equation is set to zero and solved 
for the object 0 . 

|W|^ + |ff/ + Pe*G (27) 

As shown in Andrews and Ilunt*^ the Q matrix 
may be defined as the power spectra shown below, thus 
making Equation(27) very Wiener filter like. Thus, if 

Q = implies; g‘q = Pn^Po- 


An alternate method for finding approprijite values 

for llic Q matrix are shown in Andrews and Ilunt*^ llie 
Q matrix may be chosen to minimize the second (or 
higher) difference energy of the estimfitcd object. In tins 
case, for the second difference, Q = [GJ^IQil, where 
®, is a matrix multiply, thus, Qj is defined as llie 
following tridiagonal matrix. 


-2 

1 

0 

0 

0 ... 

1 

-2 

1 

0 

0 ... 

0 

1 

-2 

1 

0 ... 

0 

0 

1 

-2 

1 ... 

0 

0 

0 

1 

-2 ... 

i .. 1 -J 


Such operator constraints guarantee that the object 
estimate docs not oscillate wildly in the constraint solution 
by minimizing higher order differences. 

Continuing with the analysis, from Equation(27) 
get its complex conjugate, 

* 

" iHi^ + |H/ + piei^’ 

and define the denominator as D and the two numerators 
as the following. 

D = |«1%|h/ + PIgP (29) 

(30) 

(31) 

Then substitute into the regularized Gonsalves error 
meuic, using the slightly rearranged Equation(22) shown 
below. 

E = + ... 

U V 

+1//0|% |//o/+ PIGOI^ (32) 
to get the following equation. 

U V 

+ 10p(|«l^|/// + plGl') (33) 
Notice tliat llie last quaiitity in parenthesis is tlic 
definition of the defined denominator, D, Tlius, after .some 
algebraic manipulations we arrive at tlie following 
equation. 

MV ^ 

Sub.stiluting in for the D and expanding with more 
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algebraic manipulations to finally arrive at the regularized 
error metric. 


_ ( 35 ) 

'.v iHi’+|ff/tPiai’ 

The same values which are used for Q in this error 
metric should also be used for the reconstruction using the 
phase diversity object estimtite with the rcgtilariAilion 
term, liquation(27), rewritten here. 




H I + Hjlj 


(36) 


4.0 CONCLUSIONS USING THE PHASE DIVERSITY 

REGULARIZATION ERROR METRIC 

It can now be hypothesized that the regularized 
Gonsalves object estimate should be used with its 
accompanying regularized Gonsalves error metric to produce 
good results. Computer simulations should follow to verify 
or disprove this hypoUiesis. The authors cliallenge new users 
to incorporate any a-priori knowledge of the specific 
problem being addressed via tlie P parameter and the Q 
matrix to produce improved results. The regularized 
Gon.salves object estimate should also be compared to 
Wiener filter reconsu-ucted results. We believe there should 
also be more concise signal to noise analysis on the phase 
diversity regularized filter to see it results can be improved. 
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ABSTRACT 

The properties of Higher-Order Statistics (HOS) open a 
new sight in the problem of signal detection and speech 
presence detection specifically. In essence, both the 
frequency wealth of speech and the ability of HOS to 
suppress additive symmetrically distributed noises as well 
as discern and extract information about deviations of 
Gaussianity and non-stationarities is exploited. 
Theoretical and experimental results have led us to two 
functions. One is obtained from the principal domain of 
the bispectrum, and the other one is the integrated 
poly spectrum. In this paper we propose how to do a 
proper use of these functions and apply to a simple 
speech detection system, 

1. PROBLEM STATEMENT 

Methods for the automatic detection of the beginning and 
ending points of an utterance are required in many speech 
processing applications, either for isolated or continuos 
speech. The most classic methods for speech presence 
detection use data frames from which estimate and extract 
some kind of feature. The most used features are short 
time energy and its variations, zero crossings, 
combinations of autocorrelation lags, etc. [7]. Once the 
information is extracted it can be applied to a detection 
system. When there is not present any disturbance at the 
signal, the detection scores are quite good. However, 
when the signal is embedded in additive noise and the 
SNR is low, poor results arise due to the impossibility to 
discern speech from noise. Tliis is specially the case with 
sounds like plosives and fricatives since they are very 
often low energy and noise affects them strongly. 
Therefore, in a frame-based method, it is difficult to 
discriminate them from noise. 

2. SPEECH DETECTION BY MEANS 
OF HIGHER-ORDER SPECTRA 

On a particular level, an important and attractive property 
of the HOS is that the HOS of two independent random 
processes equals the sum of the individual ones. A key 
characteristic of the HOS, from the detection point of 
view, is their ability to suppress any kind of Gaussian 
process. Practically speaking, this means that when HOS- 
based methods are applied to detect non-Gaussian signals 
corrupted by additive Gaussian noise they automatically 
improve the results at a given signal-lo-noise ratio 
compared against classical autocorrelation based methods. 
In particular , odd-order HOS (e.g., third-order spectra oi 


bispectra) suppress any kind of symmetrically distributed 

process. Moreover, HOS have the ability of detecung 
non-stationarities. 

Many real world processes are non-symmetncal^ 
distributed, and measurement noise can often ^ 
realistically described as a stationary symmetrically 
distributed (e.g.. Gaussian) process. Furthennore, a 
process of interest in a stationary noise background 
becomes a non-stationarity which can be detect^ by 
means of HOS-based methods. Due to the wealth of 
speech in the frequency dimension, a rnethod that explores 
the signal in the frequency dimension should lead to 
improved detection results over classic ones. Moreover, 
since speech-silence transitions cany non-stationarities, a 
representation specifically designed to detect non- 
stationarities should improve results c^cn more^ 
Therefore, it is possible to go a step further m the way 
we exU-act information from speech by explicitly 
exploiting its polyfrequency content. 

In this paper we do this by means of three bispectral- 
based functions. Firstly, the Integrated Polyspectrum (IP) 
[8] has been proposed for detecting an unknown, randmn, 
stationary. non-Gaussian signal in Gaussian noise. Tlie 
IP can be seen as the integration in one 

dimension of the polyfrequencies. It sh^es with HOS al 

their general properties. Also, the IP e^imator^re ro us 
andcLsistent. and computationally efficient. We use the 
integrated bispecUum for speech detection. And secondly, 
we can also use the bispectrum by exploiting its ability 
of discerning and extracting information about deviations 
from Gaussianity and non-stationarities. This is done 
using two regions in the principal domain, the Inner 
Triangle (IT) and the Outer Triangle (OT), respectively. 
The basic concepts related with tlte use of HOS for speech 




2.1. GAUSSIANITY AND NON- 
STATIONARITY TESTS 

Be s(n) a discrete-time real, zero mean, stationary and 
non-Gaussian random process. Its tliird-order cumulant 
and Fourier transform, the bispectrum, are [4], 

+ Mn + ^)} ( 1 ) 


B(/,g) = ISC3..^ 

7 




( 2 ) 


It is necessary a consistent estimator of B(f,g). Tliere aie 
two methods, indirect and direct. We use the direct one lor 
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which a signal frame is divided in K records of M points, 
computed the individual hi spectrum and averaged for the 
K estimates. 

Paying attention in the principal domain of B(f,g) we can 
differentiate between two regions [1,2,3]- One is the Inner 
Triangle where for continuos-time, stationary, non- 
Gaussian and unaliased processes the bispectrum is non¬ 
vanishing. The other is the Outer Triangle where the 
bispectrum will usually be nonzero when the process is 
either non-stationary or aliased. 


In [3] the authors study the ability of bispectrum for 
detecting non-Gaussian signals masked by either Gaussian 
or non-Gaussian stationary noise. Tliey propose a 
detection test using the bicohcrence function evaluated in 
the IT region. In this function the noise effect is 
mitigated by extracting its bispectrum from the signd 
bispectrum. Signal presence and transitions can also be 
detected by testing changes of stationarity [1]. When there 
is silence alone or stationary noise is present the expected 
value of B(f,g) in OT is zero even for non-Gaussian 
noise. The importance of restricting the attention to the 
OT triangle is that we can detect the presence of non- 
stationarities in a stationary noise. Tliis is just the 
situation when there is a (noisy) silcnce/signal transition. 

2.2. SIGNAL DETECTION USING THE IP 

In parallel with the development of (he work presented in 
[ 1,2,3] there has appeared another bispectral (polyspectral 
in general) function of potential application to signal [8] 
and speech detection [5]. This is the integrated bispectrum 
which is defined in the following way. Be Tb(Wni) a 

consistent estimator of the integrated bispectrum obtained 

by averaging over K individuals estimations from non¬ 
overlapping records. Then the detection function Tb is 

defined as; 

r. = ZKio’J «) 

where is the noise variance. Tb is related with the 
detection functions from IT and OT in the sense that it 
integrates both information at once. 

Our previous work over real speech embedded in synthetic 
noise [5,6] demonstrate that the functions mentioned in 
this section arc suitable for speech detection. However, it 
is necessary to study the best way to use them with real 
noises and validate the initial results. In the next sections 
we present our progresses in this direction by applying 
the detection functions to a threshold-based detecUon 
system. 

3, SPEECH DETECTION SYSTEM 

Since the information about non-stationarities and 
deviations from Gaussianity can be used separately we 
obtain two detection functions [1,3] as a summation of 
the squared module of the bispectrum at each biftequency 
in the OT and IT, let’s call F_IT (associated to speech 


presence) and F_OT (associated to speech presence and 
non-stationarities), respectively. 


F_/r=XX 




(4) 




F_OT = Zl 


f.geOJ 


B(f.g) 


(5) 


We have experimentally found that both functions should 
be jointly used if we want to obtain the best detection 
scores. In [5,6] we proposed a quotient F = F_OT / F_IT 
since it seemed the best choice in noisy environments. 
However, as SNR increases it is very difficult to apply a 
threshold because the quotient becomes greater during 
silences than at the speech parts and there appear loo 
many false detection. Tlie best way to use F_OT and 
F_IT is by means of the difference FJOT = FJT - 
F_OT. Ideally, this detection function is zero during 
noisy silences if the noise is symmetrically distributed 
and stationary thanks to the statistical properties of the 
bispectrum. In practice, it has very low and very smooth 
levels because F_OT eliminates the effect of very short 
local non-stationarities avoiding false detection. At the 
instants of spcech/silcnce transitions there is a mismatch 
in tlie time-frequency content of speech associated to both 
a non-stationarity and speech presence. This is shown as 
an abrupt change in the function which is easily 
detectable with a threshold. In consequence, after applying 
a proper threshold to this function, speech presence and 
transitions detection becomes reliable and precise. 

Of course, one can wonder what happen for more realistic 
noises. That is, if the noise is not symmetrically 
distributed and/or non-stationary, or simply unknown, is 
FJOT suitable too ?. The answer is not obvious because 
non-stationary noises produce the worst cases we c^ 
meet. At a first glance, from the properties of HOS, it is 
expected to be affirmative since F_OT is sensitive to the 
non-stationarities and smoothes F_OIT. In practice, this 
expectation is nearly correct due to inability of F_OIT. 
For instance, short-time strong noise peaks may cause 
false alarms. Experiments in the next section show that 
even in this case F„OIT is more reliable and accurate than 
the energy. 

The (poly)ffequency dimension suppose an additional 
degree of freedom from which we can take benefits. For 
instance, if we assume stationary noise for long intervals 
(which is the case in many realistic situations) it is 
possible to perform a polyspectral subtraction with the 
noise polyspectrum obtained during silences, e.g., before 
the beginning of an utterance. Thus, noise effect could Ixj 
reduced even if it is not a symmetrically distributed 
processes. Our experiments show that bispectral 
subtraction for F^OIT is computationally expensive and 
degrades detection when the noise is not stationary. 
However, the behavior of Tb is reinforced by noise 
subtraction because false alarms are strongly reduced. 
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7’.= X|7'»(h’.)-7-.(w.)|' (6) 

We use a simple speech detection scheme where detection 
is on/off when the detection function is over/under a 
given threshold. Information is extracted from overlapped 
frames of signal. For comparison purposes, the reference 
detection function is the one based on energy and we 
study the possibility of replacing the energy-based (En) 
function by the HOvS-based one. Other than energy 
functions could be used, c.g., zero-crossing rate, 
autocorrelation lags, etc. However, in noisy environments 
they impair detection. Therefore, energy seems to be good 
for comparisons. 

4. EXPERIMENTS AND RESULTS 

Tlie experiments have been made with a data base of 1083 
English isolated digits sampled at 8 KHz. In this database 
there are several examples of fricative and occlusive 
sounds such as, /s/, /f/. /t/, /th/, etc. The noises cover a 
wide range of cases. These are, synthetic stationary 
Gaussian and exponential, internal and external telephone 
line, air conditioning, car engine at 2000 and 3000 rpm, 
keyboard and fan. All noise realizations are indej:>endent 
each other. 

Before showing and comment results , some aspects must 
be pointed out. Firstly, thresholds are adaptive computed 
and depend on the mean and variance of tlie detection 
function in some preceding estimates (about 10 is 
enough). Adaptation stops when the detection is on and 
starts again when a new silence is detected. Secondly, the 
analysis frames are 37'5 ms. (300 samples) long and the 
time shift between frames is 6'25 ms (50 samples) each 
time. These values are a compromise between acceptable 
lime precision, good use of the functions properties and 
compulation time. And thirdly, there is an 
indetermination of the detection time inside the frame. 
The analytic solution to this problem appears quite 
complex. We have chosen to obtain an heuristic solution 
which stales that detection is acceptable when the signal 
inside the frame (sliding from left to right) fills up at 
least a third of its length for F_OIT and T^ and a half for 
the energy function. 

Comparisons between energy-based and HOS-based 
functions have been made taking into account precision 
and reliability. For this purpose we have distinguished 
between beginning and ending points. The graphics show, 
for different SNR (from to 0 dB), the average of results 
over all noises. In them, the energy, integrated 
bispectrum and F_OIT biised detection functions aie 
represented by and respectively. 

Reliability is measured by means of the number of lost 
beginning (graphic 3) and ending (graphic 4) points. We 
consider that a beginning or an ending point is lost if the 
error is greater than three frames (800 samples). As we 
can see, for all SNR, the HOS-based methods perfonn 


better than the energy-based. Tlie tendency of the losses is 
to increase when the SNR decreases. However, for Tb and 
F_OIT when the SNR is between 20 and 10 dB the losses 
tend to decrease. This is because for clean speech some 
human noises (e.g., breaths, clicks) in the neighborhood 
of the endpoints cause false detection. When the SNR is 
lower the noise obscure these human noises and the 
detection functions suppress their effect. For SNR lower 
than 10 dB noise is the main cause of losses. 

From the accuracy point of view wc have distinguished 
between detection of silence as speech (+) and detection 
inside speech (“) or speech segment loss. In graphic 1 wc 
can see the scores for the beginning points. The tendency 
of the HOS-based functions, specially F_OIT, is more 
conservative. We mean that the HOS-based functions tend 
to detect beginning points during the silence before the 
begging. In general, all detection functions have a similar 
accuracy (about one frame). The cncrgy-ba.scd function is 
a more accurate than the others. However, since the mean 
error is computed for the detected beginnings, if we take 
into account this fact, the HOS-based functions appear to 
be a better compromise between reliability and accuracy. 
Specially if our strategy were conservative. In graphic 2 
wc show the scores for the ending points. The comments 
about the ending point detection are similar to the ones 
for beginning points. 

5. CONCLUSIONS 

The main objective of the present work is to study the 
possibility of substituting the energy-based detection 
function by one based on bispectral measures. The results 
obtained from the experiments show that this possibility 
is very acceptable. Some aspects are still to be studied. 
For instance, the performance for non-symmetrically 
distributed and, specially, for non-stalionary noises. Thc.se 
cases appear to be the more complicated for any kind of 
detection function. However, the HOS-based seem to 
perform quite well. 
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(c) SNR {dB} (d) SNR (dB) 

Figure 1: Fine error. Average number of beginning detection 
(error < 100 ms) in terms of the SNR. Energy (—)f Integrated 
Bispectrum (..) and F_01T (.-). 


delected endings -f Graphic 2 mean error + 
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Figure 2: Fine error. Average number of ending detection 
(error < 100 ms) in terms of the SNR. Energy (-), Integrated 
Bispectrum (..) and F_OIT (.-)• 


Graphic 3 Lost beginings 
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Figure 3: Gross error. Average number of beginning loss 
(error > 100 ms) in terms of llic SNR. Energy (-), Integrated 
Bispectrum (.,) and F_OIT (.-). 


Graphic 4 Lost endings 
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Figure 4: Gross error. Average number of ending loss (error > 
100 ms) in terms of the SNR. Energy (-), Integrated 
Bispectrum (..) and F_OIT (.-). 
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Abstract 

Application of fast translation invariant 
modified rapid transform (MRT) in feature 
extraction stage of Information Symbols 
recognition system are described. Experimental 
results are given of applying the proposed 
recognition system to recognition Airport 
Passenger Orientation Symbols fin 
Meteorological Symbols, including the 
dependence of recognition efficiency on the 
number of selected features and noise. 


presented. We apply the MRT 
extraction stage of Information ym o 
recognition process. Some properties of the Kl 
and MRT will be first reviewed, then the new 
method of recognition of Information Symbols 
will be presented. Finally, the experimenta 
results will be given in applying of the proposer 
pattern recognition method to recogni ion of 
Airport Passenger Orientation Symbols an 
Meteorological Symbols, including dependence 
of recognition efficiency on number of selected 
^ . _ 1 
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Rapid Transform, Modified Rapid Transform, 
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L Introduction 

Transformation methods can be used to obtain 
alternative description of signals. These 
alternative descriptions have many uses such as 
classification redundancy reduction, coding, etc., 
because some of these tasks can be better 
performed in the transform domain [1]. 

Various transformations have been suggested 
as a solution of the problem of high 
dimensionality of the feature vector and long 
computation time. More recently the modified 
rapid transform (MRT) [2] was presented to 
break undesired invariances of the lapid 
transform (RT)[3]. 

In the paper, a new method of recognition 
Information Symbols using MRT will be 


2 Modified rapid transform 

Transforms which do not change with cyclic 
shifts in the sequence are called translation 
invariant. Fast translation invariant transforms are 
valuable tool for pure shape-specific feauire 
extraction in pattern recognition problems^ The 
transforms may be used to extract features of one- 
or two-dimensional patterns, which are invariant 
under cyclic permutations to characterize objects 
independent of their position. In the field o 
pattern recognition and also scene analysis is we 
known the class of fast translation invariant 
transforms - certain transforms (CT) [4] based on 
the original rapid transform (RT) [3] but with 
choosing of other pairs of simple commutative 
operators. The RT results fiom a minor 
modification of the Walsh-Hadamard transform 
(WHT). The signal flow graph for the RT is 
identical to that of the WHT, except that the 
absolute value of the output of each stage ol the 
iteration is taken before feeding it to the next 
stage. This is not an orthogonal transform, as no 
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inverse exists. With the help of additional data, 
however, the signal can be recovered from the 
transform sequence, i.e. inverse rapid transform 
can be defined [5]. RT has some interesting 
properties such as invariance to cyclic shift, 
reflection of the data sequence, and the slight 
rotation of a two-dimensional pattern. It is 
applicable to both binary and analogue inputs and 
it can be extended to multiple dimensions. More 
recently was introduced the modified rapid 
transform (MRT) [2] which can distinguish many 
more patterns from one another that the original 
RT can. The MRT was presented to break 
undesired invariances of the RT which leads to a 
loss of information about the original pattern. 
This is achieved, by combining the RT with 
preprocessing steps using a asymmetric neighbor 
operator a. This operator is used to break 
undesirable invariances but keep the shift 
invariance of the MRT. Using the symbolic 
notation we can introduce MRT as follows; 



Fig. 1. Signal graph of the MRT 


Signal graph of MRT (Fig.l) results from 
signal graph of RT with adding in general k 
preprocessing steps x'=ax. This maps the element 
x(i) of input vector x to element of vector x 
by working on the elements x(i), x(i+l) and 
x(i+2) 

x'(i)=fo(x(i).x(i+l)Mi+2)) (0 

It is important that the operator/o be asymmetric 
because we want to destroy the invariance of R1 
under reflection. Operator/o may be realized in 
the following simple manner 
x'(i) =fo(x(i).x(i+0-^0+^)) =x(i)+\x(i+ l)-x(i+2)\ 

The transform process of MRT (Fig.l) - identical 
to the transform process of RT requires N 2 


input pixels, where n is a positive integer. Each 
column of the transform process in Fig.l 
corresponds to a particular computational step, n 
steps are required. In general the variables in 
any column (r) are calculated from variables x^'" 
in the preceding column (r-I) by 

x^''^(i+2js)=f\(x^’''’^(i+2js),x^''''^(i+(2j+l)s)) 

x^’Ui+(2j+l)s)=f2(x^'"‘^(i+2js),x^'"’’(i+(2j+l)s)) 

(3) 

where operators f\,f 2 for MRT (or RT) aie 
fx(a,b)=a+b-, 

f 2 (a,b)=\a-b\ (4) 

and s=2’'-^\ 1=2'"'; i=0,....s-l; j=l...,t-l and x = 
x^°^ are input data (pixels) and sx=MRT{x} 
are spectral coefficients of MRT. 

MRT can be applied in all areas where the RT (or 
any transform from class Cl) can be used. Some 
undesired invariances of RT can be destroyed 
applying only one preprocessing step. 
Experiments from use of MRT [2,6] in character 
recognition showed, that MRT can distinguish 
many more patterns from one another than the RT 
or the Fourier power spectrum. 

3. The Information Symbols 
Recognition System Model 

The recognition system is simulated in digital 
computer using program package CT-CAD [7]. It 
contains the following sub-systems(Fig.2): 



Fig.2. The MR'f recognition system 


1. Original digital picture preprocessing system 
CSPO-in was used to accepts the physical input 
picture and then transduce it into a measurable 
matrix. CSPO-HI divides a visual pattern into 
small elements and after suitable preprocessing 
produces an NxN matrix over the binary field; the 
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element becomes 1 or 0 depending upon whether 
it is black or white. 

2. The MRT processor according to its function 
may be also called a feature extractor. A 2D MRT 
of all binary prototypes is taken in this stage. 
Than feature selection is carried out in the MRT 
“spectral’’ domain on various basis (maximum 
value of spectral coefficients, variance zonal 
sampling and interclass standard deviation). 

3. 3 he selected MRT features of binary pictuies 
(symbols) are in the teaching process feeded into 
the memory. Thus the memory unit learn the a 
priori knowledge of each class before the system 
can be used to make any decision. In the 
recognition process the selected MRT features are 
feeded into the classifier, which discriminates 
each pattern (symbol) and assigns a category (a 
class) to it by some decision rule. We use a 
simple classifier based on cross responses dki 
between two different patterns from class k and 1, 
defined in the next section. 

4. Recognition of Information 
Symbols 

The proposed Information Symbols recognition 
system was tested on the two classes of selected 
symbols: 

1. Airport Passenger Orientation Symbols (class 
consist of M=11 independent symbols) (Fig. 3). 


o 

0 


Fig.3. The Airport Passenger Symbols 

2. Meteorological Symbols (class consist of 
M=16 independent symbols) (Fig. 4). 



• X A A 

Ml 

V 

M5 M6 MS 

M9 MIO Mil M12 

/ 0 GC 

MIS M14 MI5 M 

Fig.3. The Meteorological Symbols 

We implemented feature extraction with MRT 
at the both sets of Information Symbols. In 
general, the efficiency of feature extraction can be 
assessed by the system confusion matrix D-{dkt: 
k, 1=1,..-M} where du are cross responses (or the 
distances between any two different symbols k, I 
in the feature space) and M is the number of 
classes or number of different symbols. The 
confusion matrix can be calculated in two steps 
shown as follows: 

A. All M prototypes of Information Symbols, 
each represented by a binary NxN matrix (xk(i,j)t 

with i. j=l....N: k=l . M and M=ll or M=16) 

are transformed to the MRT transform domain 

Xk(i,j) = ^■{x(i,j)} 

where x = MRT. 

B. The cross response between two different 
symbols from class k and / is defined as follows: 

i.j=l 

The results of experiments of dependence of 
recognition efficiency on number of selected 
features and influence of noise are shown on 
Tab.l and Tab.2. A set of 165 symbols were used 
for testing and teaching purposes, testing set used 
on Tab.l contains 5 noised symbols for each 
Airport Passenger Orientation Symbol. A set ot 
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Tab 1 Recognition of Aiiport Passenger Orientation 
symbols (class consist of 11 independent symbols) 


Noise 

Noised 

teach.set 

Number of 
coefficients 

Recogn. 

Effic. 

Comment 

0% 

No 

1 f 0.098%) 

100% 

1% 

No 

10 (0.98%) 

100% 


1% 

Yes 

8 (0.78%) 

100% 


2% 

No 

^40 (>18.75%) 

95% 

Z6->Z4...3x, 

Z7->Zl...Ix 

2% 

Yes 

8 (0.78%) 

98.18% 

Z8>>Z7...Ix 

Yes 

16 (1.56%) 

100% 


3% 

Yes 

00 

O 

V-/ 

CO 

100% 


4% 

Yes 

18 (1.76%) 

100% 

--— 


240 symbols were used for testing and teaching 
purposes, testing set used on Tab.2 contains 5 
noised symbols for each Meteorological Symbol. 


Tab.2 Recognition of Meteorological symbols (class consist 


of A/=16 independent symbols) 


Noise 

...—^ 

Noised 

teach. 

Set 

Number of 
coefficients 

Recogn. 

Effic. 

Comment 

0% 

No 

1 (0.4%) 

100% 

.—^ —. 

1% 

No 

10 (4.0%) 

96.25% 

M13->M3...1x 

1% 

No ^ 

13 (5.0%) 

100% 


1% 

Yes 

8 (3.125%) 

100% 

_ _ _ 

2% 

No 

>48 (>18.75%) 

95% 

M6->M4...3x, 

M7->M1...Ix 

2% 

Yes 

8 (3.125%) 

100% 


3% 

Yes 

>40(2^1 5.625%)T 

97.5% 

M3>>M10...1x, 

M14->M2...1x 


Tire results of both experiments may be 


summarized as follows; . , 

A. Only one preprocessing step in MRT signal 
graph is sufficient to destroy the undesired 
invariances and improve significantly 
capability of MRT distinguish many more 
patterns from one another than the original RT 

B Even if a veiy simple classifier was used, the 
recognition efficiency 97%-100% can be 
obtained with selecting only a couple ^ 
features (0.4%-5% of the number of MRT 
coefficients) in the MRT spectral domain, 
even if the symbols are corrupted by (l%-3%) 
noise. 


5 . Conclusion 

We apply the MRT in feature extraction stage 
of Information Symbols recognition system. 
Experiments with recognition of two classes of 
symbols (Airport Passenger Orientation Symbols 
and Meteorological Symbols) demonstrate that 
even if very simple classifier was used, the very 


high recognition efficiency can be obtained with 
selecting only a couple of features in the MRT 
spectral domain, even if the symbols are 
corrupted by noise. 
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ABSTRACT 

In this paper an ellicient open hashing function is devel¬ 
oped using a combination of dynamic systems analysis 
and number theory. The new hash function appears to 
nearly match the optimal double divide hash function 
for uniform data distributions, and performs signifi¬ 
cantly better for clustered data distributions. A higher 
integer Lyapunov exponent for initial data probes is 
indicative of this improved cluster hashing behavior. 
The number of mathematical operations per probe in 
the new hash function matches that of double division 
hashing. 

1. INTRODUCTION 

Hash functions are ubiquitous in the field of Computer 
Science. They are widely used to gain rapid access to 
databases, operating systems, compilers, and a ranges 
of business and scientific applications. Despite the im¬ 
portance of these functions, only a primitive theoret¬ 
ical understanding of what makes a good hash func¬ 
tion exists. This research began with the premise that 
hash functions might better be analyzed using mea¬ 
sures from the study of non-linear, chaotic systems. 
The results presented indicate that it is possible to 
build an efficient open hash function that performs bet¬ 
ter on clustered data than the commonly used double 
divide hash function. 

A hash table is a well-known data structure used to 
maintain dynamic dictionaries. A dynamic dictionary 
is used to manage a collection of data items (each of 
which has a unique key value) that can be accessed 
according to the following operations: 

1 . Search{ky S). Returns the data item with key k 
in dynamic dictionary S. 

2. Inseri{x,S). Adds data item x to dynamic dic¬ 
tionary S. 


3. Del€te{k,S). Removes the data item with key k 
from dynamic dictionary S. 

The hash table data structure consists of an array T 
whose M slots are used to store the collection of data 
items. When implementing the above operations, an 
index is computed from the key value using an ordinary 
hash function /i, which performs the mapping 

h : U 

where U denotes the set of all possible key values (i.e., 
the universe of keys). Thus, h{ki) denotes the index, 
or hash value, computed by h when it is supplied with 
key ki € U . Furthermore, one says that ki hashes to 
slot T[h{ki)] in hash table T. 

Since \U \\3 generally much larger than N, h is un¬ 
likely to be a one-to-one mapping. In other words, it is 
very probable that for two keys ki and kj, where i ^ j, 
h{ki) = h{kj). This situation, where two different keys 
hash to the same slot, is referred to as a collision. Since 
two items cannot be stored at the same slot in a hash 
table, the Insert operation must resolve collisions by 
relocating an item in such a way that it can be found 
by subsequent Search and Delete operations. 

One method of resolving collisions, termed open ad¬ 
dressing by Peterson [5], involves computing a sequence 
of hash slots rather than a single hash value. This 
sequence is successively examined, or probed, until an 
empty hash table slot is found in the case of an Insert 
operation, or the desired item is found in the case of 
Search or Delete operations. 

Typically, in open addressing, the ordinary hash 
function discussed above is modified so that it uses both 
a key, as well as a probe number when computing a 
hash value. This additional information is used to con¬ 
struct the probe sequence. That is, in open addressing, 
hash functions perform the mapping 

h-.U y {G,l,...,N 

and produce the probe sequence < ho(k), hi{k), hiik ),... >. 
Because the hash table contains N slots, there can be 


201 



at most N unique elements in a probe sequence. A 
full length probe sequence is defined to be a probe se¬ 
quence < H{k,l),H{k,2)..,H{K,N) > which visits 
all N table entries after only N probes. 

A general form for dynamical systems is given by 
the first order recurrence relation 

Xn + \=^fM xq = c ( 1 ) 

where the constant c is the initial condition, and / : 

^ SR. The function / generally must be non-linear 
to generate complex behavior. This simple system is 
called an iterator. It is well-known that for some choices 
of even simple / in equation (1), a system that exhibits 
extremely complex behavior can be obtained. One such 
form of behavior is referred to as chaos. While a univer¬ 
sally accepted definition of chaos does not exists, it is 
generally agreed that one characteristic is sensitive de¬ 
pendence on initial conditions, coupled with bounded 
behavior [4]. Qualitatively, an iterator is said to be 
sensitive to initial conditions if the orbits that result 
from two initial conditions, which are arbitrarily close, 
are distinctly different. The technique most often used 
to detect this type of behavior involves computing the 
Lyapunov exponent of system (1), which will be defined 
in section 3. 

2. OPEN HASHING 

Open haishing is an insertion strategy for resolution of 
data collisions based on probing of hash table entries 
until an empty table slot is found. The hash function 
H{k, i) is used to denote a probing hash function where 
k is the key associated with the data being inserted and 
i is the probe index. Knuth [3] notes that the desirable 
properties of an open hash function include: 

• Efficient hash function evaluation time. 

• A long probe sequence to accommodate tables 
near capacity. 

♦ Different probe sequences for each data item to 
avoid primary and secondary clustering. 

# Even data distribution over the entire table size 
for both initial and subsequent probes. This prop¬ 
erty is widely known as the uniform hashing prop¬ 
erty [1]. 

3. CHAOTIC MEASURES AND 
DYNAMICAL SYSTEMS 

The assertion that hash functions and chaotic iterators 
share some of the same desired properties was first put 


forth in [2]. The authors suggest that a chaotic itera¬ 
tor which exhibits sensitive dependence on initial con¬ 
ditions might also perform well as a hash function. The 
authors introduce the notion that hash functions can be 
transformed into chaotic iterators in the real domain, 
allowing some measures from the field of non-linear dy¬ 
namics to be applied. This was done by converting the 
hash functions to iterators in the continuous domain, 
and then applying the continuous Lyapunov exponent 
to the resulting iterator [4]. The results showed that 
the corresponding double hashing iterator had a posi¬ 
tive Lyapunov exponent in the real domain, indicating 
that this iterator has sensitive dependence on initial 
conditions. Similar tests for linear hashing indicated 
that it had a zero Lyapunov exponent, or no sensitive 
dependence on initial conditions. 

Additional work done in the integer domain by the 
authors indicates that measurement of the Lyapunov 
exponent, modified slightly for the integer domain, does 
provide an indicator of distance between iterations, and 
therefore provides a useful measure. The details of this 
measurement and analysis are too long to include in 
this abbreviated paper. The relevance of this Lyapunov 
measure to open hash functions, is that it provides a 
measure of the ability of the open hashing function to 
quickly distribute data during successive probes. Fur¬ 
ther, our analysis shows that the most commonly used 
open hash function, double divide hashing, actually has 
a low Lyapunov exponent as measured over the first few 
probe sequences. This indicates that the double divide 
hash function tends to place clustered data close to¬ 
gether during successive probes, leading to poor per¬ 
formance for some clustered data configurations. This 
key result wets then used to develop our exponential 
hash function. 

4. AN EXPONENTIAL HASHING 
FUNCTION 

The Lyapunov measurements above led to the develop¬ 
ment of a better hash function for use on clustered data. 
Two alternatives exist, either choose non-linear func¬ 
tions modulo N for hi{k) and h 2 {k) or create a non¬ 
linear modulo N probe function. The problem with the 
first approach is that tlic hash functions used \n dou¬ 
ble hcishing must be quickly evaluated, yet must also 
preserve uniform distribution of the hashed data in the 
table space, both described in section 2. It is difficult 
to create a non-linear modulo N function that meets 
all three criterion. A good choice would appear to be 
h(k) = k^ mod N where the exponent m is chosen to 
be relatively prime to iV - 1, and N is prime. This 
function, similar to that used in public key encryption, 
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appears to be a good choice because it is a permuta¬ 
tion of the values [2... AT — 1], because it is non-linear, 
and because it uniformly distributes the data. How¬ 
ever, the evaluation of the integer exponent is much 
more expensive that the simple divide hash function 
h[k) = k mod N^ requiring a multiplication and divi¬ 
sion for each bit of m versus a single division [7]. How¬ 
ever, consider that two different hash functions must 
be evaluated for each key accessed. Clearly this would 
be a poor choice in terms of performance relative to 
commonly used methods. 

What is needed is a hash probe function that has 
a large Lyapunov exponent as evaluated over the first 
few iterations, rather than the entire range of N. This 
is based on the fact that while double hashing has a 
nearly ideal Lyapunov exponent when evaluated over 
the whole table, its worst Lyapunov measure is in the 
first few probe iterations. In addition, the hash func¬ 
tion should preserve all of the desirable characteristics 
of double hashing, including fast run time, long probe 
sequences, and no primary or secondary clustering for 
similar keys. The function could not be a linear func¬ 
tion of i, or it would suffer from the same limitation 
as double hashing. Some introductory number theory 
is necessary to develop a new hash probing function 
based on the calculation of an exponent modulo N. 

Definition 1 (cyclic group, generator) If a group G 
contains an element a such that every element of G is 
of the form a^ for some integer k, then G is a cyclic 
group, and a is called a generator of G [6]. 

The group Z* consisting of the elements { [1 ■.. p] 
} and operator *, which is normal multiplication mod¬ 
ulo p, p prime, forms a cyclic group. This is derived 
directly from the definition of a cyclic group. The hash 
function: 

H{k, i) = h{ky mod N (2) 

where h(k) is a hash function returning values in the 
range [2,..., A/]. Equivalently, expressed as an iterator 
using the Xi notation previously defined: 

Xi = Xq mod p (3) 

where p ^ N must be a prime hash table size. This 
function is similar to the USA and ElGamal cryptosys¬ 
tems [7], in that a finite field exponent is used to cre¬ 
ate a non-linear permutation of values. This probe se¬ 
quence has the following characteristics: 

• It can be computed efficiently. The value Xi at 
the Vih step is simply the previous value Xi^i 
times xo modulo N. This requires the same num¬ 
ber of mathematical operations as linear or dou¬ 
ble hashing. 


• The probe sequence is non-linear. Small pertur¬ 
bations in the initial value xo become large dif¬ 
ferences after only two iterations. 

• The probe sequence depends entirely on the ini¬ 
tial hash value xq, which may lead to primary and 
secondary clustering. Fortunately this can easily 
be remedied by adding a second hash function 
value h:i{k) as will be demonstrated shortly. 

• The probe sequence is not of length N for all val¬ 
ues of Xo, since only cases where xq is a generator 
for will generate the full domain. 

4.1. THEORETICAL PERFORMANCE 

First, consider the probe length of the groups and sub¬ 
groups that equation (3) generates: 

Definition 2 (order) The number of unique elements 
in a group is called the order of the group. The group 
Zp has order p — 1. 

Definition 3 (subgroup) A subset H of a group G is 
a subgroup of G, if H is itself a group relative to the 
binary operation defined in G. 

Theorem 1 (Lagrange*s Theorem) If G is a group of 
order N, then the order of every subgroup H of G is a 
divisor of N. 

Lemma 1 The number of generators for a cyclic group 
of order N is where <I>{N) denotes the Euler func¬ 

tion — the number of integers less than N which are 
relatively prime to N. 

Ideally, all of the elements of Z‘ should be generators 
of the entire group, for this would imply that every 
element could be generated starting with any element. 
This would mean that every element leads to a probe 
sequence of length p—1. Applying the above to the new 
hash function, it is readily apparent that Z* is a group 
of order p - 1, and that the number of generators for 
the group is <^(p - 1). Unfortunately p must be prime 
for Z* to be a cyclic group. This means that p -1 must 
be an even number, since p is odd. Therefore p—1 must 
have as one of its factors the number 2, which means 
at most only p/2 elements of Z* as generators of the 
group. This can be easily verified empirically by trying 
any small group of prime size. 

Next, apply Lagrange’s theorem to partially correct 
this deficiency. Since Z* is a group of order p - 1 ail 
subgroups H of Zp must have orders that are divisors 
of p - 1. However, carefully selecting p - 1 such that 
it is the product of 2 and another prime number t will 
assure that all subgroups of Z* have order of either 2 
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or i. In fact if the prime i is chosen carefully so that 
p = + 1 is also prime, all elements in the group Z* 

will either be generators of the entire group, generators 
of subgroups of order t, or generators of subgroups of 
order 2. 

Since the Hubgroiip of order 2 is also cyclic, it has 
only one generator in Z*. It is easy to see that the 
vahie X = (p— 1) is in fact the only element in Z* which 
can generate a subgroup of order 2. These results are 
summarized in theorem 2. 

Theorem 2 Given p and t primes, with p — 1 2t, 

the group G ^ Z* contains exactly i — 1 generators for 
the entire group G, t elements which are generators for 
subgroups of order i, and one element which generates 
a subgroup of order 2. 

Therefore, the following conclusions can be derived for 
the exponential probe function in equation (3) by ap¬ 
plying theorem 2. 

• Half {t — 1) of the choices for a;o will be generators 
for Zp , and will create probe sequences of length 
P-1. 

• Exactly t of the values for xq will generate probe 
sequences of length t. 

• Only one value, xq = p - 1 will generate a poor 
probe length of 2. This value can be avoided by 
choosing the initial value oiq = (h mod (p—2))4‘1. 

• Different initial values xo will generate unique 
probe sequences in Z *. 

• The primes t and p can be efficiently generated 

p—1 2t using probabilistic primality test¬ 
ing. The Prime Number Theorem says that there 
are approximately log(A^) prime numbers less than 
N. Therefore the expectation is that one would 
have to explore at most \og^{N) such pairs to 
find a suitable table size probabilistically. This 
only needs to be done during initialization of the 
table. 

This new exponential probe function has many desir¬ 
able characteristics. Except for the less than optimal 
probe length on 1/2 of the table elements, it has many 
of the characteristics of double hashing. 

4.2. HASH TABLE EXPERIMENTS 

To test the above hypothesis, we implemented both 
double hashing and the new exponential hash function. 
Table sizes were determined using the double prime 
criterion where T'/ = p = 2< + 1 required for the ex¬ 
ponential hash function where p and t are both prime. 


The Miller-Rabin probabilistic primality test was used 
to determine the next largest prime table size meeting 
this criterion given a target table size [7]. Based on the 
earlier analyses for both functions this should produce 
optimal prol)e lengths. All trial runs were done by cre¬ 
ating two identical empty hash tallies of the name size. 
ElementH cht)S( 3 n at random from the data distributions 
described lielow were succ(3HHiv<ily added to the table to 
achieve a load factor of 95% ( a = 0.95 ) of the table 
size. The measure of merit was the average number of 
probes required per element added. For example, if k 
elements take a total of m probes, the average probes 
per element is simply m/k. Samples of this metric were 
taken every 5% of the table load factor, from 5% ... 95% 
to determine the behavior as load factor increased. 

Summaries of four experiments are presented here: 

• Uniform data distribution over the entire table 
size — To show that the exponential hash func¬ 
tion and double hash function have statistically 
equivalent performance for uniformly distributed 
random data. 

• Clustered data distribution — To demonstrate 
improved performance of exponential hash func¬ 
tion over double hash function for tightly clus¬ 
tered data. 

• Variation of cluster size — To demonstrate sensi¬ 
tivity of improved exponential hash function per¬ 
formance to size of the data cluster. 

• Variation of table size with fixed percentage clus¬ 
ter — To demonstrate sensitivity of exponential 
hash function to table size using a fixed data clus¬ 
ter size. 

4.3. UNIFORM DATA DISTRIBUTION 

The control case for this analysis was a series of runs 
done on a uniform data distribution with a fixed table 
size. Two identical tables of equal size were created 
and filled to 95% capacity, one using the double divide 
hash function and the other using the exponential hash 
function. The test was repeated 100 times using a dif¬ 
ferent random number seed for each run to determine 
if any statistical difference in total number of probes to 
fill the table could be detected. A table size of 3023 was 
used for these runs. The summary results are presented 
in table 1. 

With 100 runs, no statistically significant difference 
in performance could be detected. The difference in 
total number of probes between the two functions is 
only 1.76%, which is significantly less than the 15.51% 
standard deviation as measured between runs of the 
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Measure 

Double 

Exponential 

Total Probes 

1028281 

1010148 

Avg Probes Per Run 

10282 

10101 

Std Deviation 

sosTi 

257.7 


Table 1: Uniform data comparison 


same hash function. This means that the exponen¬ 
tial hash function and double divide hash function are 
statistically equivalent for randomly chosen uniformly 
distributed data. 

4.4. CLUSTERED DATA DISTRIBUTION 

The second experiment involved clustering the data 
over a sub-interval of the total table size, in an attempt 
to simulate a dense data cluster. The dense data clus¬ 
ter represents many real world data sets where «Jata is 
far from evenly distributed. A table size of p - 
was chosen, and all of the data was chosen at random 
from a single data cluster of approximately 300 ele- 
ments from the beginning of the data space. Samples 
of the average number of probes per data element were 
taken for every 5% of table size, up to a total table load 
of 95%. Results of this test indicates that the exponen¬ 
tial hash function out-performs the double divide hash 
function. For high table load the double divide hash 
function stores data in as little as half the number of 
probes. 

4.5. VARIATION OF CLUSTER SIZE 

This experiment was similar to the Previous one, b^ 
in this case the cluster size was varied from 2% to 20/o 
of the overall table size. Again, identical tables were 
created and populated using both the double divide 
hash function and the exponential hash function. Data 
was taken at random from a cluster of size varying frorir 
2% to 20% of the table size, and the average number of 
probes per element inserted was sampled for each table 
to reach 95% capacity. A table size of 2027 elements 
was used for all experiments. The results, show that the 
exponential hash function uses far fewer probes than 
the double divide hash. Further the relative advantage 
seems to be larger for more tightly clustered data. 

4.6. VARIATION OF TABLE SIZE 

The next experiment varied the table size to see if it had 
any effect on relative performance of the two functions. 
Tables were created with from approximately 1000 en 
tries to 15000 entries, and filled to 95% capacity using 
the double hash and exponential hash function. The 


exponential hash function performed 
ble sizes, and table size appeared to 
on the relative outcome. 


better for all ta- 
have little effect 


5. CONCLUSIONS 

The resiillB presented here indicate a new relatio.^^iip 
between chaotic iterator theory and 

hash function performance. Results indicate 

proposed exponential hash function does 
double divide open hashing for some clustered data d - 
tributions. and performs as well for uniform data dis¬ 
tributions. A number of avenues for future 
now open. It is likely that other measures from non¬ 
linear systems theory can be applied to hash 
and may also provide additional indicators of hash ta- 
bfe p" formanL. possibly leading to further improve 
ments in the exponential open hash function presented 
here. Further, it may eventually be possible to app y 
chaotic measures to other hash 

such as cryptographic signature verification to detect 
undesirable hash function characteristics. A 1 
version of this paper, with complete details of the Lya¬ 
punov exponent analysis and complete test results are 
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ABSTRACT 

In this paper, we present the DSP Software Engineering 
DSPSE) Fixed Arithmetic C++ Tool (FACT™) method for 
modeling DSP fixed-point processors in a C++ environment 
md converting a floating point model to a given DSP fixed- 
X)int processor model. All DSP fixed-point processors have 
issociated with them a set of fixed bit length data 
representations for the storage and manipulation of binary 
information. We define a C++ class for each distinct fixed 
jit length data representation of a given DSP fixed-point 
processor, such that, the behavior of the given DSP fix^- 
point processor can be achieved in a C++ environment using 
che library of classes. For our own development, we have 
created, most recently, a FACT™ library for the Texas 
Instrument TMS320C54x DSP fixed-point processor. The 
rMS320C54x library has been used in the development of 
[he Japanese Vector Sum Excited Linear Prediction 
;JVSELP) algorithm and the International 
Telecommunications Union G.728 standard algorithm. 

I. INTRODUCTION 

W ith the explosive growth of the DSP market, we 
have seen a direct increase in the use of fixed- 
point digital signal processors in a variety of 
industries, such as telecommunications, speech/audio 
processing, instrumentation, military, graphics, image 
processing, control, automotive, robotics, consumer 
jlectronics and medical technology. In general, fixed-point 
DSPs compared to floating-point DSPs are less expensive, 
iise less power, and less space. One advantage of a floating 
Ijoint DSP is a sntaller development cost (i.e. man hours), 
however, you compromise a greater production cost. Thus, if 
possible, companies are using and will use fixed-point DSPs 
for their products. In the near-future, we will be faced more 
and more with the challenge of real-time implementations of 
complex DSP algorithms on fixed-point DSPs. The FACT 
procedure is the outcome of our desire to decrease the 
development time of fixed-point implementations. 


We will assume the following software development cycle 
model for the real-time implementation of a given algorithm 
on a fixed-point DSP: 

1) floating point model 

2) fixed point model 

3) real-time implementation. 

At DSPSE, our development time is drastically reduced 
using a FACT™ procedure with the above development 
model. By decreasing development time, we have narrowed 
the advantage gap between floating point DSPs and fixed- 
point DSPs. 

Besides being able to model a fixed-point DSP in a C++ 
environment, a FACT™ library expedites the conversion of 
an algorithm from a floating point model to a given fixed- 
point processor model; from step 1 to step 2 in our 
development model. For the FACT™ floating-point model 
we define a C++ class, say FLOAT. We attach various data 
members to our class FLOAT to keep track of pertinent 
information for transforming a floating point model to a 
fixed-point model. Moreover, suppose the floating point 
model of an algorithm calls N modules then we need a fixed- 
point model for each of the N modules under each fixed- 
point processor we wish to model. 

Situations will also arise when we will want to convert 
only certain modules to a fixed-point processor model while 
leaving other modules as a floating point model, such as a 
fixed-point encoder and a floating-point decoder. In order to 
accomplish the dual existence of a fixed and floating point 
model, we create an C++ interface class, to do exactly that, 
interface a fixed-point module with a floating point module. 
In terms of linear algebra, our interface class acts as a 
transformation operator, transforming from a FACT™ fixed- 
point model space to a FACT™ floating-point model space. 

The paper is organized as follows. In section II, we 
discuss the creation of a FACT™ library. Within section III, 
we further explain the FACT procedure by showing 
examples in modeling the TMS320C54x. We introduce a 
FACT Floating-point model in section IV and discuss the 
transformation from a floating point model to a FACT™ 
fixed-point model. While our conclusion is in section V. 
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II, Creating a FACT’^” Library 

A. Distinct Fixed Length Data Representation 

All fixed-point DSPs have associated with them a set of 
fixed bit length data representations for the storage and 
manipulation of binary information. A fixed bit length data 
representation is considered distinct if any of the following 
three conditions are met: 1) the length is different; 2) if the 
length is the same then an operation exists which will 
produce a different result given the same input value(s) of 
identical length and under the same control conditions; 3) if 
the length is the same then an operation exists which can not 
be performed on a data representation of the same length. 
By control conditions, we mean all status fields, control 
fields, mode of operation and the like. 

The reasoning for condition 1 is obvious, an LI bit length 
can not exactly represent an L2 bit length, unless LI = L2. 
Suppose LI = 16 and L2 = 32, we can not use 16 bits to 
represent 32 bits. One miglit say, you can use 32 bits to 
represent 16 bits. For example, let us assume we are using 
tlte lower 16 bits of a 32-bit representation to simulate a 16- 
bit representation. From our point of view, the 16-bit 
simulation is not the same as the actual 16-bit representation 
since we are concerned with bit exact similarities. That is, 
the 16-bit simulation really is 16 zeros followed by l6 binary 
digits, as compared to just 16 binary digits. Condition 2 
exists when L1=L2 and at least one operation will produce 
different results with the same identical inputs. For 
example, a fixed-point DSP could have more than one 
accumulator and depending which accumulator is an input 
and/or an output, an operation produces different results. 
Condition 3 exists when L1=L2 and an operation can not be 
performed on all representations of the same length, just 
some. Again, using the multi-accumulator example, at least 
one operation exists that will not accept all accumulators as 
an input. For example, on the TMS320C54x there are 
instructions which will produce different results depending 
on whether the source (input) or destination (output) 
accumulator is A or B, even if the input value and the 
control conditions are the same. And, as is for most 
processors, certain registers which are 16-bits in length can 
not be operated on as a 16-bit short data memory operand 
can. 

The different fixed bit length data representations are 
grouped into a set. We will refer to the set of fixed bit length 
data representations, for a given fixed-point DSP, as the 
lengtli set vector, A = {Xi,X 2 ,...,X,m}, where each h, for i = 
{1,2,...,M}, is a non-zero positive integer equal to the length 
of the distinct representation in bits. Thus, M is the total 
number of distinct representations of information possible on 
a given fixed-point processor. For example, on the 
TMS320C54X the length set vector, Acs^x = 


{40,40,32,16,16}. Hence, we can conclude that the 
TMS320C54X has five distinct data representations; two 
being 40 bits in length, one 32 bits in length, and the other 
two, 16 bits in length. The two 40-bit lengths, 32-bit length 
and two 16-bit length are due to the existence of 40-bit 
accumulator A, 40-bit accumulator B, the ability to address 
32-bit operands, 16-bit registers and the ability to address 
16-bit operands, respectively. 

B. C++Class Hierarchy 

As stated earlier, each distinct fixed bit length data 
representation has an associated C++ class. Thus, each 
has an associated C++ class which, if possible, is derived 
from another class for the same bit-length. The actual 
procedure for deciding which, if any, class a given distinct 
fixed bit length data representation is derived from is 
developed in [2]. The authors in [2] use the projection 
theorem by representing each distinct fixed bit length data 
representation as a vector space. 

The base class may be an abstract class, which allows pure 
virtual function declarations or, the base class can define a 
virtual standard set of operation definitions to be performed 
on the use of a base class object. The former choice is good 
in applications where the end-user must choose which DSP 
fixed-point processor to model since objects of an abstract 
class can not be created, while the later is used in situations 
where no specific processor is modeled but the standard DSP 
processor as determined by the library creator. That is, 
objects of the standard class are allowed. The concept of the 
base class becomes more clear as we explain the power 
structure of class inheritance 

Suppose we want to create a library to model DSP fixed- 
point processors A, B and C. Let us assume that the length 
set vectors for DSP processors A, B and C are 


Aa= [40,40,32,16,16], 

M=5, 

(I) 

Ab= [64,40,32,16,16], 

M=5, 

(2) 

Ac =[64,40,32,32,16,16], 

M=6, 

(3) 


respectively. 

For sake of brevity, we will go througli the details of 
creating only the class for the 64-bit length data 
representation of the DSP fixed-point processor B needed to 
create the library. The same procedure is applied to the 
other bit-length data representations. Furthermore, we will 
assume that we have already created a base 64-bit base class, 
say 164, with virtual operator definitions. Thus, we need to 
create a class, say I64_B, for the 64-bit length data 
representation of fixed-point processor B. 

The operators(i.e. instructions) to be defined in the 164_B 
class can be grouped into two categories, (a) operators 
already defined in the 164 class and (b) operators not defined 
in the 164 class. You can think of the category (a) operators 
as the projection of the 164_B operators onto the 164 
operators. Of course, if the projection was the empty set then 
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I64_B will not be derived from 164. Furthermore, the base 
class should not have any operators for which the 164_B 
class should not implement. Analogous to linear algebra, 
the previous statement implies that category (a) accompanies 
all of class 164, the base class, such that I64_B is the direct 
sum of 164 plus category (b) operators. That is, 

I64_B = 164 ® category (b). (4) 

For our case, let us assume that the projection was not the 
empty set and that all 164 operators are to exist in the 164_B 
class, such that I64_B is derived from 164. 

We can divide I64_B operators into two orthogonal sets of 
operators. The first set is accomplished by taking the 
operator projection of 164 onto I64_B. We will refer to the 
operator projection of class ot onto class p in our study, as 
0<a,P>. The other set is the rest of the I64_B instructions 
which need to be initially defined for the implementation of 
an I64_B object. Therefore, we can decompose our I64_B 
class into the following: 

I64_B = 0(I64,I64_B) ®(I641I64_B) (5) 

The last term, (164 1 164_B), is the set of operators which 
need to be added to the I64_B class, what we refer to in 
equation (4) as category (b) operators. 

The same methodology is applied to the creation of the 
rest of the classes until all 16 classes are created. Once we 
have these 16 classes we have a library for modeling DSP 
fixed-point processors A, B and C in a C-H- environment. 
One possible power structure of the class hierarchy for a 
library to model fixed-point processor A, B and C is shown, 
in figure 1, below. We show the structure with two 16-bit 
length standard base classes; 116 to mimic 16-bit length data 
operands and R16 to mimic 16-bit length registers. 



Figure 1: Power Structure of Class Hierarchy for Example 
Library 


III. Examples of the TMS320C54x FACT^“ Library 

The length set vector for the TMS320C54x, Ac34x = 
{40,40,32,16,16}, contains five (5) elements. Let us focus 
on the accumulators. The TMS320C54x has two 
accumulators, referred to as accumulator A and accumulator 
B, with a 40-bit length. Each accumulator contains three 
memory-mapped registers: Guard bits (AG,BG), High-order 
bits, (AH,BH), and Low-order bits, (AL,BL). As shown in 
figure 6 and figure 7, the layouts for the accumulators are the 
guard bits which are 8-bits in length, while the high-order 
and low-order bits are 16-bits in length, bringing the total 
length to 40. 

The TMS320C54X I40A/B class is used to declare and 
define operators and functions which utilize the 
TMS320C54X accumulator A or B. In other words, if you 
were to use an assembly instruction equivalent, equivalence 
with respect to an operator or function in the C++ model, the 
final result bit matches with the C++ model result. 
Moreover, in the C++ model, we are able to explicitly state 
whether a 40-bit variable resides in accumulator A or 
accumulator B, by creating two separate classes. 

Our simulation is accomplished by using a 32-bit integer 
and an 8-bit unsigned character in tandem as the data 
members for our 140 structure, shown in figure 8. The 32-bit 
integer is called guardhi, while the 8-bit character is called 
low. As shown in the layout below, guardhi contains the 32 
MSBs of the accumulator and low contains the remaining 8 
LSBs. In other words, guardhi contains the guard bits, high- 
order bits, and 8 MSBs of the low-order bits and low 
incorporates just the 8 LSBs of the low-order bits. 

As a reminder, the 140 layout, in figure 8, does not use 
accumulator specific notation (e.g. AH vs. H), since the 140 
structure is accumulator independent. That is, the 140 class 
is a base class for the two accumulators. Simply stated, the 
ability to do 40-bit manipulation and operations is 
accomplished by telling guardhi and low what to do for each 
operator and function defined within this structure. 

IV. FACT™ Floating-point Model 

A DSPSE FACT™ Floating point model uses C++ classes 
for creating instances of variables. The FACT™ floating 
point data representation is implemented by a C++ class, 
say FLOAT. We attach various data members to our class 
FLOAT to keep track of pertinent information for 
transforming a floating point model to a fixed-point model. 
The preferred embodiment has the following data members: 

Value = current value 

Max_abs = running maximum of the absolute of Value 

Min_abs = running minimum of the absolute of Value 
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Avg_abs = running average of the absolute of Value 
Var abs = running variance of the absolute of Value 
Read count = number of read accesses made of Value 
Store_count = number of write accesses made of Value 
We also declare global variables to keep track of the 
number of time a give function is called. In the preferred 
embodiment, we keep track of all mathematical operations 
(addition, multiplication, subtraction, division). Having the 
information provided by the preferred embodiment on any 
variable we declare as a FLOAT aids in determining the 
computational complexity, dynamic range, scaling effects, 
and Q storage format. 

V. Converting a FACT^^ Floating point Model to a 
FACT™ Fixed-point Model 

Let us turn our attention to converting a floating point 
model of an algorithm to a given fixed-point processor 
model. Suppose the floating point model of the algorithm 
calls N modules then we would need a fixed-point model for 
each of the N modules under each processor we wish to 
model. Situations will also arise when we will want to 
convert certain modules to a fixed-point processor model 
while leaving other modules as a floating point model. One 
scenario could be a fixed-point encoder in tandem with a 
floating point decoder or you may want to convert only one 
module to a fixed-point model at a time and still be able to 
execute your algorithm with floating point modules. 

In order to accomplish the dual existence of a fixed-point 
and floating point model, we create an interface class, to do 
exactly that, interface a fixed-point module with a floating 
point module. Let us call the interface class Toint with a 
public data member, say data, of class type FLOAT. For 
sake of brevity, let N = 2 and say we want a fixed-point 
processor B model for the pure float model example 
algorithm shown in figure 2. In figure 3, we are testing a 
fixed-point model of fund ( ) with a floating point model 
of f unc2 ( ), while in figure 4 the roles of the modules are 
reversed. Then, in figure 5, we show both modules being 
fixed point models. 

FLOAT fund (FLOAT) ; 

FLOAT func2(FLOAT); 
void main{ 

FLOAT a,b,c; 
b=funcl (a); 
c=func2(b); 
return 0?) 

Figure 2: Pure FLOAT model. 

Toint fund (Toint) ; 

164_B fund(l64_B); 

FLOAT func2(FLOAT); 
void main{ 

Toint a,b; 

FLOAT c; 
b=fund (a) ; 
c=func2(b.data); 


return 0;) 

Toint fund (Toint d) 

{I64_B e(d) ; 

I64_B f; 
f=fund (e) t 
return (Toint)f;} 

Figure 3: Mixed Model 

FLOAT fund (FLOAT) ; 

Toint func2(Toint); 

I64„B func2(I64_B); 
void main{ 

FLOAT a; 

Toint b,c; 
b.data=fund (a) ; 
c=func2(b); 
return 0;} 

Toint func2(Toint d) 

{I64_B e(d); 

I64_B f; 
f =sfunc2 (e) ; 
return (Toint)f;) 

Figured: Mixed Model 

I64_B fund(l64_B); 

I64_B func2 (I64_B) ; 
void main{ 

I64_B a.b,c; 
b=fund (a) ; 
c=func2(b); 
return 0; 

Figure 5: Pure Fixed Model 

By taking advantage of C++ function mangling, we create 
three definitions of a module(i.e. same function name): 
floating point definition, fixed-point definition, and interface 
definition. The interface definition accepts as arguments 
interface class objects with data members of class type 
FLOAT, then converts the objects to a fixed-point data 
representation class for the desired DSP, in our case a 64-bit 
length data representation for processor B, I64_B. Then, the 
interface definition calls the fixed-point definition, which 
returns a fixed-point class object to the interface definition. 
The returned fixed-point class object is converted to an 
interface class object upon return to the calling function of 
the interface definition. The main feature is that we can 
easily simulate the algorithm on another processor by 
replacing all instances of 1400 objects with a data 
representation of the target processor. Furthermore, we can 
have assembly level characteristics in our C++ environment 
since we define the behavior of all operations under all 
control conditions. For example, our add operators can 
simulate sign extension mode, overflow mode, etc. 

VI. Conclusion 

Using the FACT™ approach, one can create a library, 
with an efficient class hierarchy, for accurately modeling 
various DSP fixed-point processors in a C++ environment. 
Furthermore, the FACT™ library is an adaptive library. 
Adaptive in the sense that other fixed-point processors may 
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be added in their entirety or for a current library fixed-point 
processor, its associated operators and their definitions may 
be added, removed or modified as needed. Once a FACT™ 
library is available for a given set of processors, any 
algorithm can be modeled under any fixed-point processor of 
the library. The multi-processor capability of a FACT™ 
library facilitates the comparison of an algorithm under 
different fixed-point processors without necessarily coding at 
assembly level. Moreover, by using a FACT™ library, the 
development time involved in going from a fixed-point 
model to an assembly level version for a given algorithm is 
dramatically reduced. The reduction is possible since a 


FACT™ fixed-point model has assembly level characteristics 
built into it. 
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Figure 6 Layout of TMS320C54x Accumulator A 
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Figure 7 Layout of TMS320C54x Accumulator B 
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ABSTRACT 

In this paper, we discuss different methods of double 
talk detection used for echo cancellation in a hands-free 
telephone set. We deduce a general state-space repre¬ 
sentation of the set, which leads to state-dependent 
structures for stepsize control. The ai>proach is further 
generalized by introducing fuzzy logic and fuzzy state 
memberships. Finally, first results are shown which are 
obtained with identical detection methods, but apply¬ 
ing different control structures. 


Introduction 

A hands-free telephone set includes a loudspeaker and 
a microphone which are placed in the same room. One 
problem arising from this fact is that the far-end sig¬ 
nal is retransmitted to the far-end speaker due to the 
acoustic coupling. This means that the far-end speaker 
hears his own voice after some delay, which can render 
the conversation impossible. 

This effect can be avoided by echo cancellation: an 
adaptive filter imitates the acoustic transmission sys¬ 
tem of the room. An artificial ”echo” can thus be pro¬ 
duced and subtracted from the local signal. Echo can¬ 
cellation can deal with double talk situations since the 
estimated echo is suppressed while the local speaker 
signal is transmitted. 

Much work has been dedicated to the convergence 
speed of adaptation algorithms, often based on time- 
invariant room characteristics. In a real environment, 
however, the room impulse response is not constant 
over time, which is due to movement and temperature 
in the room, and the filter has to be adapted during 
the whole conversation, which includes noisy and dou¬ 
ble talk situations. Therefore, the convergence speed of 

The author is supported by the Graduiertenkolleg Intelli- 
gente Systeme in der Informations- und Automatisierungstech- 
nik 


real-time adaptation is strongly influenced by the abil¬ 
ity of the system to adapt to the instantaneous situa¬ 
tion. Without appropriate adaptation control, even the 
fastest algorithms cannot guarantee a sufficient adjust¬ 
ment of the filter. This will be explained at the example 
of the NLMS algorithm whose adaptation equation is 


c{k -f 1) = c(A:) H- a 


e{k)x{k) 

IMF 


( 1 ) 


where c(fc) is the vector of N filter coefficients at time 
sample A;, x(A:) the vector that comprises the N latest 
far-end signal samples in a column, e{k) the error that 
results from substracting c^(A:) x(fc) from the local sig¬ 
nal at time fc, and a the stepsize. The correction term 
will be large whenever the error is large, and 
small when the error is small. But e(A:) contains adap¬ 
tation error, local noise and local speaker signal. When 
this term is large, the echo canceller might be badly ad¬ 
justed or else the local speaker might be active. In the 
latter case correction should be small instead of large. 
The more disturbance is involved, the smaller the step- 
size must be. Therefore, we need a control mechanism 
that can determine the situation and choose the step- 
size accordingly. Since the detection algorithms are not 
fully reliable, combining them could be helpful. 

This paper is organized as follows: In section 1 we 
present and discuss several methods for the determina¬ 
tion of an appropriate stepsize. Section 2 introduces a 
state-space representation of the hands-free telephone 
set. General control structures including fuzzy con¬ 
cepts are presented in section 3 before we show first 
results in section 4. 


1. METHODS FOR STEPSIZE CONTROL 

Echo cancellation algorithms are usually deduced by 
modelling the far-end speaker signal and the local dis¬ 
tortion as mutually uncorrelated white noise. The op¬ 
timal stepsize can then be calculated as 
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_ E{eHk)} 

‘’P* £;{e2(fc)} 

where 

e(*) = ig- £{k))'^ S.{k) 


( 2 ) 

(3) 


is the part of the error signal e{k) caused by misadjust- 
ment, g being the room impulse response (see [5]). 

Hence, the optimal stepsize is one when there is 
no background noise or local speaker signal, and close 
to zero in the case of an active local speaker. Since we 
cannot measure e{k) but only e{k), the optimal stepsize 
has to be estimated by one of various methods. 

One possibility is proposed in [5]: Two different FIR 
filters are used, an internal one for the determination of 
the coefficients and the stepsize, the other for the actual 
echo cancellation. The stepsize can then be estimated 
by adding artificial delay coefficients as first taps to the 
internal filter. These ”noncausal” filter coefficients are 
to adapt to zero, so their adaptation quality can be 
measured and the optimal stepsize is calculated in the 
form: 




E{e^k)} 

Nt-I 

E cUk)al{k) 


N + Nt ito 


Nt 


am 


(4) 

(5) 


with Nt the number of artificial delay coefficients in¬ 
troduced, and N the number of filter coefficients for 
imitation of the room impulse response. 

This algorithm works quite well for time-invariant 
room impulse responses. It can deal with different lev¬ 
els of background noises and sets the stepsize to very 
small values during double talk. But the artificial delay 
coefficients will not detect changes in the room impulse 
response because their misadjustment to the zero vec¬ 
tor remains the same whereas the misadjustment of the 
echo cancellation part of the filter increases. 

Another control strategy is to set the stepsize to 
zero during double talk and to a constant value dur¬ 
ing single talk (considering the usual background noise, 
this value will be less than 1). The task is here to detect 
double talk or dominant background noise and switch 
the adaptation on or off. One method for double talk 
detection is proposed in [2] and uses the normalized cor¬ 
relation between the far-end signal and the local signal 
which consists of the echo and the local speaker signal. 
Stating that the room impulse response only slightly 
affects the correlation between the far-end speaker sig¬ 
nal before and after passing the room, we assume a 
high correlation to be caused by low echo attenuation 
and therefore enable the adaptation, whereas with low 


correlation it is disabled. The correlation term must 
be estimated over a limited number of samples, so that 
there is a tradeoff between estimation quality and de¬ 
tection delay. During this delay double talk, although 
taking place, has not yet been recognized, which may 
lead to severe misadjustment. The importance of the 
damage depends on the convergence speed of the adap¬ 
tation, so that with a small stepsize during adaptation, 
the stepsize control by correlation is satisfying, but not 
with fast algorithms or large stepsizes. 

Otluir metliods involv(^ knowledge about room and 
speech characteristics, usually in the frequency domain. 
One detector recognizes variance of the room impulse 
response by analyzing the spectra of the error signal 
and the local signal. A large quantity of the speech sig¬ 
nal power is situated with lower frequencies, whereas 
moving objects in a room cause high-frequency change 
of the room impulse response (see [4]). Therefore, an 
increasing error signal power, for which the power in 
the higher band is increasing in relation to that of 
the local signal, indicates that the room impulse has 
changed, else double talk is assumed. This criterion 
reacts rapidly because of the short filters used for the 
analysis, but it only works at the beginning of the vari¬ 
ation of a room impulse and does not indicate when 
this situation ends. Additionally, this method requires 
a certain level of echo attenuation to function properly. 

Even more speech characteristics are used for the 
so-called cepstral distance measure which is described 
in [1] and determines if the error signal comes from the 
far-end speaker or not by analyzing the cepstrum. Like 
most speech processing tools, it is quite complicated to 
calculate and implies a considerable delay. Many more 
methods can be adopted from speech processing, but 
they are usually not fast enough and will not be further 
discussed in this paper. 

2. STATE-SPACE REPRESENTATION 

These are only some of the known algorithms, but tluiy 
show that the criteria usually do not distinguish be¬ 
tween all the situations, or states, of the hands-free 
telephone set. The stepsize control should rely on dif¬ 
ferent criteria for different states. Therefore, we inter¬ 
pret the hands-free telephone set as a finite-state ma¬ 
chine whose relevant states are described by four pa¬ 
rameters, i. e. sufficient/ insufficient amplitude of the 
far-end signal, sufficient/ insufficient adaptation, local 
noise/ negligible local noise, local speaker active/ in¬ 
active. These parameters lead to the distinction of 16 
states which can be imagined as the corners of a four¬ 
dimensional cube. Since the basic condition for efficient 
adaptation is a sufficient excitation level, we can repre- 
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local speaker 
active 



Inner cube: 
Insufficient excitation 

negligible 

local noise outer cube: 

sufficient excitation 


local noise ^^110; large stepsize 

llglit grey: small stepsize 
dark grey: stepsize 
very small or zero 


Figure 1: Representation of the hands-free telephone 
set in the state space. Bold arrows represent critical 
transitions 


sent all states without sufficient excitation in one state. 
This state can easily be determined, as the excitation 
signal is known. The number of states is thus reduced 
from 16 to 9. In case of sufficient excitation, the state 
space can now be represented as a three-dimensional 
cube. In this representation, we assume that only one 
variable changes its value at a time, i. e. only the transi¬ 
tions situated at the edges of the cube are allowed. This 
is justified by the high sampling rate used for real-time 
application. If we decide to stop the adaptation entirely 
during double talk, we can define one state double talk 
comprising the four former states, so that, depending 
on the chosen control structure, the number of states 
can be further reduced. 

The adaptation control should be fast and reliable 
so as to maximize convergence speed and optimize track¬ 
ing behaviour. The critical transitions are therefore 
linked to the ability of the criteria to distinguish be¬ 
tween the states. The critical transitions for several 
criteria are drawn as bold arrows in fig. 1. 

The first control algorithm described here has only 
got to be restarted when the room characteristics change, 
i. e. there are two states to be distiguished, and the 
critical transitions lead from "sufficient adjustment" to 
"insufficient adjustment”. But in general, as for the 
correlation coefficient method, the critical transitions 
are from single talk to double talk: If these transitions 
are not detected correctly, the adaptive filter can be¬ 
come completely misadjusted after only a few samples. 
On the other hand, if we try to detect every transi¬ 
tion to double talk, we will also define many single talk 
situations as double talk and will reduce convergence 
speed considerably. 

To compensate for the weakness of the algorithms in 
some situations, complete control algorithms combine 
the stepsize control by delay coefficients with the cep- 
stral distance measure (see [1]), usually by logic AND, 
or one chooses a parameter set for the method so that 


the convergence of the echo cancellation is convienient 
for all states (see [2]). 

To further improve the stepsize, we can extend both 
concepts: we can try to optimize either the parameter 
set for each state of the telephone set or the way of 
combining the methods, e. g. by nonlinear functions. 
We can generalize the concept in choosing an appro¬ 
priate combination of reliable criteria for each state of 
the telephone set. In this case, we have to determine 
the instantaneous state, and choose the kind of com¬ 
bination to be applied accordingly. In order to extract 
more than a binary decision from the double talk de¬ 
tectors, we can utilize fuzzy logic so that there exists 
an individual fuzzy rule base for each state. The new 
stepsize is one of the clues for the determination of the 
new state. 

criteria 

state-specific 
weights 

states 

state-specific 
rule base and 
defuzzification 

detection of current stateand stepsize 



Figure 2: Concept for the logic of the control unit, 
discrete states. Bold lines mark the active path. 



determination of new grades of membership and of stepsize 


Figure 3: Concept for the logic of the control unit, 
fuzzy states. Dashed lines mark weight control. 


We can also take into account the uncertainty about 
the states, i. e. how reliable the statement is that results 
from the combination of the possibly contradictory cri¬ 
teria. The uncertainty about the current state would 
then influence the stepsize, e. g. if we are not really 
sure whether double talk is taking place, the stepsize 
might be chosen smaller so as to reduce the damage 
in case the local speaker is active. The structure of 
this concept is shown in fig. 3. It reduces the propa¬ 
gation of errors, but consists of a more sensitive rule 
base which is also more difficult to verify. All the dif¬ 
ferent structures can be shown in the space spanned 
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double lalk is diagnosed in the shaded regions 






decision based on stale-dependent functions state-dependent decision, 

fuz7.y stale membership 


Figure 4: The different concepts in the space spanned 
by two criteria 


by the criteria. This is done in fig, 4, with two crite¬ 
ria. Interpreting the results of the two methods as the 
coordinates of points, we will probably find that for a 
certain state points will mostly be found in a certain 
region, but we cannot be sure that those regions are 
well seperated from each other. The aim of the com¬ 
bination of criteria will be to construct a function that 
seperates the regions as well as possible, with regard 
to error probability and quality. Evidently, we will in 
general be more successful if more types of functions 
are allowed. The concept of state-dependent functions 
is then illustrated in the lower part of the figure. 


able. The compromise used in the real-time implemen¬ 
tation was then to set the number of samples to 72. 
When the correlation coefficient passes 0.9, single talk 
is assumed, and the stepsize is set to 1.0 in this pa¬ 
per. This concept is a special case of the state-space 
approach, will one state, one criterium, and crisp logic, 
and will further be referred to as /372- To extend this 
principle, we splitted the criterium: Since the correla¬ 
tion coefficient needs some time for the decision, due to 
the number of samples over which it is calculated, we 
reduced this number to 20, which makes it loss reliable. 
We therefore used it in two forms: once in its original 
form so that it reacts fast (p^), and once in a smoothed 
form obtained by an ARl filter {pi)- Both criteria are 
worse than the one used before (see fig. 5 for the case of 
double talk) but contain different information which is 
combined by a fuzzy logic unit. The fuzzy logic block 


comparison of the three correlation coefficients 



Figure 5: Comparison of correlation coefficients with 
three parameter sets 

uses triangular membership functions, inference is re¬ 
alised by minimum and maximum (see [3]), and the 
defuzzification corresponds to a simplified ’’center of 
gravity”, i .e a weighted average of the membership 
functions of the output: 


3. RESULTS WHEN EMPLOYING THE 
CORRELATION METHOD 

In this section we show some results that have been 
obtained by applying the different approaches to only 
one method, namely the correlation coefficient, with 
different parameter sets. The method was described in 
[2] as a double talk detector. To obtain a good estima¬ 
tion of the correlation coefficient between incoming and 
outgoing signal, one has to calculate it over as many 
samples as possible, but this will also lead to long delay 
for the double talk detection and is therefore unaccept¬ 


Oi = OLhighPdhigh + ^tmedPamed + ^lowPaio^„ (^) 

The rule base is shown below, and results are compared 
with those of the usual criterion in fig. 6. 

IF ps HIGH AND pi HIGH, THEN a HIGH 

IF Ps NOT HIGH AND pi NOT HIGH OR IF p, LOW, 

THEN a LOW 

IF Ps NOT HIGH AND ps HIGH OR IF pi HIGH AND ps 
MEDIUM, THEN a MEDIUM 

Our second extension regards the number of states. 
In the simulations we observed that a relatively high 
threshold, in order to provide stability in double talk 
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Figure 6: Comparison of the results with time- 
independent structures 



0 0.2 0.4 0.6 0.8 [ 1.2 1.4 1.6 1.8 2 


lime (sec) 

Figure 7: comparison of state-dependent approaches to 
the correlation method 


situation, leads to slow convergence for single talk. So 
we try to infer the instantaneous state from the latest 
stepsize, i. e. if we calculate a large stepsize, we will as¬ 
sume single talk and thus facilitate large stepsizes for 
the next sample. It also means that in the case of a 
large stepsize for double talk, which is in itself an er¬ 
ror, the damage will be amplified by encouraging more 
large stepsizes. Therefore we must be very certain that 
single talk is taking place when detecting it. Our new 
fuzzy logic unit now only contains different parameters 
for the membership functions and the defuzzification 
algorithm, but uses the same rule base as before. The 
states are determined by a state-dependent switch: 

IF a < 0.5amo* THEN SET DOUBLE TALK TRUE; 
ELSE IF a > Q.Samax THEN SET DOUBLE TALK FALSE; 

In the last step we transform the states into fuzzy 
numbers. The uncertainty about the state is thus inte¬ 
grated in the stepsize determination. The membership 
function of the state of double talk is calculated from: 


f 0.6- 

l^dt — f^dt 1 Q g_ 

a 

a 

Ofma* 

OC < 0.6 Ctmnx 

Q! > 0-8 Otmax 

(7) 

f 

1 

fidt >= 1 


with pdt = \ 

0 

fidt <= 0 

(8) 

1 

fj'dt 

0 < /Udt < 1 



The state membership also infiuences the fuzzy num¬ 
bers, as shown here for LOW: 

LOW = fidt LOWdt + (1 - fidt) LOWst 

The results of the two state-depinulent structures 
are shown in fig. 7. Both structures lead to better 
convergence than the state-independent approach, but 
the computational load and the optimization of the pa¬ 
rameters become more important and more difficult to 
handle. 


4. CONCLUSION 

In this paper we presented several structures for step- 
size control in adaptation algorithm. By using a gen¬ 
erally optimized double talk detector and splitting it 
into two criteria with different parameter sets, we could 
extract more detailed information and combine it by 
state-dependent fuzzy logic. The results improved with 
every new extension of the structure. The number 
of parameters increases noticeably by including fuzzy 
logic, so that an optimization over a complete fuzzy 
state-dependent rule base might improve results even 
more, especially since better state determination will 
probably improve the results considerably for state- 
dependent control. 
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ABSTRACT 

The goal of this paper is to present a methodology for 
the design of VLSI circuits for image and video co Jng 
applications. The software environments for high/bit- 
true level simulations and hardware development are 
described. An example of an area efficient single-chip 
implementation of a JPEG coder is presented to illus¬ 
trate the methodology. 

1. INTRODUCTION 

The basic steps of the methodology here reported are; 
a) high-level implementation of the algorithm b) yhSI 
architectures design c) bit-true level modeling d) data¬ 
path development and e) simulations. A brief descrip¬ 
tion of the Joint Photographic Expert Group (JPEG) 
standard is given below to illustrate the procedure. 

The JPEG standard describes an algorithm for the 
coding of continuous-tone still images [1]. It specifies 
four modes of operation: sequential, progressive, loss¬ 
less and hierarchical. The sequential mode is by far the 
most used since it covers a wide range of applications 
and is hence the mode we address in this paper. The 
sequential (or baseline) mode is depicted in Figure 1. 
The algoritlmi is a DC'l' (Discrete Cosine Translorm) 
-based process which transforms blocks of 8x8 pixels 
sequentially from the original image into 8x8 blocks of 
coefficients in the frequency domain [2]. The goal of 
this transformation is to decorrelate the original data 
and redistribute tlie signal energy among only a small 
set of transform coefficients in the low frequency zone. 
Based on psychovisual analysis, a normalization array 
can be defined. Its purpose is to quantize those DOT 
coefficients that are visually significant with relatively 
sliort quantization steps, while using large quantization 

*llis work was supported in part by the Laboratory of Mi- 
crotechnology (LMT EPFL) common to the Swiss Federal Tnsti- 
trite of Technology, Lausanne, and the Institute of Microtechnol- 
ogy, University of Neuchatel. 


tens for those coefficients which are less irnportant. 
Dhis large-step quantization associated with Uie energy 
lacking effect of the transformation, results m general, 
n the zeroing out of many DOT coefficients. A long s - 
luence of zero valued coefficients can then be effic|ently 
ibridged by runlength coding. Though the q'^anti 
ion is the main mechanism of data compression (and 
dso of information loss), additional data compression 
:an be obtained by entropy coding the output of the 

luanteer. and bit-true level implementation of this 
dgorithm are described in section 2 and 4 respectively, 
Lg with a description of the software environment 
rhe VLSI architectures are discussed in section 3. 
lath development is discussed in section 5. Finally, le 
esults and conclusions are given m section 6. 

2. HIGH LEVEL IMPLEMENTATION 

^ modular high level implementation 
rorithm is shown in Figure 2. The top and bottom flow- 
raphs represent the encoder and decoder respectively 
Each process of the algorithm was implemented w h 
:: code, using floating point precision for all the arit i- 
netic operations. Each module was then converted into 
I Khoros routine of the Khoros software environment 
31. From a digital image processing point of view, this 
ligh-level implementation is an application Program V 
itself 14). Typical compression ratios are around lU, de¬ 
pending on the image activity and spatial resolution. 
. fin/i'i' Tnnr.BOXl was created and 



Image 


Figure 1; JPEG baseline algorithm 
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Figure 2: Implementation of JPEG in Klioros 


integrated into the Klioros system. Besides the rou¬ 
tines shown in Figure 2. this toolbox contains some 
mathematical functions and some programs for other 
methods of image compression (e.g., adaptive image 
compression). 

3. ARCHITECTURES 

Taking into consideration the constraints of the target 
application, a VLSI architecture for each block in Fig¬ 
ure 2 was developed. Given their associated high data 
throughput, image and video coding applications re- 
(iiiirc particularly high speed implementations. Using 
bit-serial architectures leads to tightly pipelined struc¬ 
tures at the bit-level, which implies that a maximurn 
clock-rate can be achieved [5]. Furthermore, bit-serial 
modules require less area than their parallel counter¬ 
part. Thus, whenever it was possible, we have chosen a 
full pipeline bit-serial approach. In the following para¬ 
graphs we describe the VLSI architectures for each of 
the main modules of the encoder in Figure 2. 

a) FDCT: The forward 2-D DOT appropriate for 
JPEG is defined in [1] as: 

E * 

x=0 y=o 

cos[(2a; -f l)mr/B5]i 

for «,« = 0,1,...,7, where C7„,C„ = 1/2 for «,t; = 0; 
Cu,C„ = 1 otherwise. Given that the 2-D DOT is sep¬ 
arable", it can be reformulated as two successive 1-D 



Figure 3: Distributed Arithmetic Processor 


DCTs, which leads to a simpler hardware implemen¬ 
tation! Each 1-D DCT is executed by a Distributed 
Arithmetic Processor (DAP) whose architecture is shown 
in Figure 3. Pre-addition/pre-subtraction operations 
can be used to exploit the symmetry of the 1-D DCT 
kernel. This results in halving the number of multi¬ 
plications, or equivalently, in reducing the size of the 
DAP’s ROM from 2^ to 2^^^ words [6]. The architec¬ 
ture also includes a transposition memory between the 
two 1-D transforms to store the results of the first 1-D 
DCT. 

b) Quantizer: The second operation of tlic JPEG 
coder is the quantization of the 2-D DCT coemdents 
and is defined as: Cqtj = round[Cij/Nij) for i,j = 
0,1,...,7, where Cij denotes the ij-th element of an 
8x8 DCT coefficients matrix, whereas Nij denotes the 
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Figure 4: Architecture of the quantizer 


i>th element of an 8x8 normalization matrix. By refor¬ 
mulating the previous equation as Cqij = round{Cij * 
{\/Nij)) allows replacement of the rather complex cir- 
. cuitry of a divisor by that of a simple multiplier. The 
.VLSI architecture of the quantizer is shown in Figure 4. 
Since the output of the 2-D DCT processor is bit serial, 
a serial-parallel multiplier was implemented. The par¬ 
allel input to the multiplier being the output of a ROM 
containing the inverse of the normalization coefficients. 

c) Entropy coder: The last operation of the JPEG 
algorithm is entropy coding. Its goal is to incre^e the 
compression performance of the encoder by taking ad¬ 
vantage of the statistics of the symbols at the output 
of the quantizer. While both Huffman and arithmetic 
coders are supported by the JPEG standard, the base¬ 
line JPEG algorithm uses Huffman coding only. The 
circuit of the Huffman coder was based on [7]. Before 
assigning a variable length code to its input symbols, 
the Huffman coder must implement other operations, 
i.e., the runlength coding mentioned in the introduc¬ 
tion, a category selection, etc. All these tasks were 
implemented with random logic and the Huffman tal)le 
proposed in Annex K.3 in [1] was stored in a ROM of 
bits. A logic circuit packs the compressed bits into 
words of 32 bits at the output of the Huffman coder. 

Between the quantizer and the entropy coder in 1' ig- 
ure 1, some additional operations are defined by the 
JPEG standard: a) Raster to zigzag reordering: the 
reordering of the quantized 8x8 DCT coefficients 2-D 
array into a 1-D array, by order of increasing spatial 
frequency. This is implemented with a RAM with dif¬ 
ferent read and write address sequences, b) DPCM: a 
DPCM coding of the DC DCT coefficients. This op¬ 
eration is implemented with a single subtractor and a 
register to store the prediction value. The latter oper¬ 
ation is embedded at the input of the Huffman coder. 


2-D Discrete Cosine Trensforn 


Input File L ___= 

Output File ----— 

Nuwber of points (1D-BCT> |8^ Tl 
Input Uordlensth (bits) □ 

mi Quantization ( Rounding ) 

Output Uordlength (bits) 

H Quantization f RoundlnQ 
InterMedlato Input Mordlengt h (bits) 

H Quantization f Truncation J 
Intermediate Output Uordleng th (bits) 

im Quantization ( Truncation^j 
Ron -DA based Mordlength (bi ts) 

mm Quantization C Rounding ) 

Input Scale L 

Intermediate Bata Scale L _UMi BB 

ml Model pre-additions 
■ Iipleiwnt an Offset Binara Code 


Figure 5; GUI of the 2-D DCT bit-true level model 

4. BIT-LEVEL IMPLEMENTATIONS 

lased on the programming framework of the level 

Tiplementation, a bit-true level model of the JPEG al- 
orithm was built. Each process of the algorithm was 
gain implemented with C code and converted into a 
fhoros routine. However this time, the arithmetic op- 
rations are all carried out in binary arithmetic, model- 
ig accurately the same processing and dataflow as they 
rauld be executed by the corresponding processors and 
rchitectures described in section 3. The graphics 
ser interface (GUI) corresponding to the bit-tjue level 
lodel of the 2-D DCT is shown in Figure 5. Hoth 
Dunding and truncation can be modeled. The scale 
ictors are required to fix the location of the binary 
oint for each register. By using offset binary coding 
he number of words of the DAP’s ROM can be re- 
uced by a factor of two. Intermediate data refers to 
he data between the two 1-D DCTs. To analyze the 
uantization effects of a particular process, one sun- 
ly substitutes the high-level implementaUon of that 
rocess in Figure 2, by its corresponding bit-true level 
nodel. Exhaustive simulations can then be executed 
0 determine the optimum wordlength for the differ- 
nt signals and coefficients. The effects on the final 
ompression ratio and reconstruction quality can also 
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he e<asily evaluated. This bit-true level reconfiguration 
and subsequent simulation is easily done by a few clicks 
on the Khoros visual language interface (Cantata). 

For the 2-D DCT circuit, the optimum DAP’s data- 
and ROM-wordlength found is 12 and 10 bits respec¬ 
tively. Simulations also have shown no difference of 
the results between using rounding or truncation, thus 
truncation mode was used. For the quantizer circuit, a 
ROM wordlength value of 9 bits and an input/output 
data wordlength of 12 bits were found. For these val¬ 
ues, we obtained for several test images practically the 
sanie compression ratio and reconstructed image qual¬ 
ity (Signal-to-Noise Ratio) as those obtained by using 
the floating point modules. The quantization table pro¬ 
posed in Annex K.l of reference [1] was retained for our 
circuit. 

5. DATAPATH DEVELOPMENT 

For datapath development we ii.se the tools of the Com¬ 
pass environment [8]. When a part of an algorithm is 
highly regular, the Layout editor tool is used to cre¬ 
ate full custom modules. This does not penalize the 
time for development since just a few cells must be de¬ 
signed, on the other hand, minimum silicon area and 
high speed are achieved. For less regular structures, 
the standard cell approach is used. The Cell Compiler 
tool is used to generate Random Access Memory mod¬ 
ules. The circuit is built with the Logic Assistant tool. 
Intensive simulations are then carried out with both 
the Mixed-Mode and SPICE simulators. 

G. RESULTS AND CONCLUSIONS 

A semi-custom methodology for the VLSI implementa¬ 
tion of image compression systems was reported. The 
software environments were described along with the 
an example of the implementation of a JPEG coder 
circuit. The area of the resulting chip is 4.6 x 3.1 mm 
« 14.5 mm^ without pads. It was implemented in the 
1 2/im CMN12 process from VLSI Technology Inc.. At 
a clock frequency of 36 MHz this circuit is able to pro_ 
cess 25 GIF (Common Intermediate Forrnat: 352 x 
iiixels) images per second. Thus it is suitable for mo¬ 
tion JPEG (MJPEG) or for the non-recursive path ol 
the 11.261 low-bit rate video coder. Several improve¬ 
ments have been studied for this circuit, one being a 
power saving technique that trades image quality for 

power consumption [9]. j • *i • 

Though image compression was addressed in this 
paper, the methodology could equally be applied to 
other kind of image processing applications. Interrne- 
diate results can also be used to develop solutions for 


other kind of technologies (DSP, FPGA, etc.). Current 
work involves the extension of the current methodology 
for the development of low-power image compression 
circuits for portable applications, 
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ABSTRACT 

This paper presents DTMF (dual lone imilli-rrequcncy 
signaling) detection using nonnnifonn digital filter bank. 
An algorithm for the detection uses a modified Gocrtzcl 
algorithm Ilf It has been implemented in digital signal 
processor TMS320C50, To reduce the size of the analyzed 
block of samples, the varying block size is proposed, 
dilTcrcnt for each of the DTMF frequency. 


I. INTRODUCTION 

The DTMF system for push-button telephone sets is a 
CCITT standard that appears as Recommendation Q.24 in 
CCITT Blue Book [21. In DTMF signaling, each signal 
consists of a couple of sinusoidal signals with proper 
frequencies. These couples are allocated to the various 
digits and symbols of a touch-tone keypad. These 
frequencies belong to two mutually exclusive groups (the 
low frequency group and the high frequency group) of 
four frequencies each. The major requirements of DTMF 
system are as follows: 

• frequencies of receiving signals: 

low frequency group: 697, 770. 852, 941 Hz, 
high frequency group: 1209, 1336, 1477, 1633 Hz, 
frequency tolerance: ± 1.8%. 

• level of receiving signals for which the receiver 
works correctly: 0 -i- -30 dBm, 

• twist: ± 5 dB. 

• time parameters: 

duration of a generated signal: min. 60 ms, 
duration of a break between signals of two 
consecutive digits: min. 60 ms 
maximum speed of signaling: one digit per 120ms, 
duration of signal recognition: max. 40 ms, 
duration of break recognition: max. 40 ms. 

This work wns supported pnrily by grant KBN-44-4.’)2 and partly by DPB- 
44-443/11 


2. .S1»EC:TRAL analysis of DTMF SIGNALS 

Detection of DTMF signals is provided using discrete 
Fourier transform (DFT). Because only a subset of DFT 
output samples is needed, FFT (fast Fourier transform) 
algorithms are not in this case optimally effective. 

Better approach would be a Goertzel type DFT 
algorithm which allows serial processing and requires 
smaller memory space and smaller number of 
computations. Goertzel algorithm is given by the graph in 
Fig.l. 



Figure 1. Goert7.el algorithm 

To reduce errors of allocation of center frequencies 
(4], the coefficients cos(lnk/M) can be replaced by 
cos(27t/^.^^), where yj. - DTMF frequency, /,- sampling 
frequency, but the effect of this correction is not of 
primary importance. 

Because the procedure of spectral analysis is 
provided using energies of eight DTMF sinusoidal 
components and their second harmonics, the graph of 
Goertzel algorithm reduces to the form in Fig.2. The basic 
algoritm step is then given by 

«'*(«) = ^(«)+2cos(2;^* I f,)w|^{n-l)-w^{n-2) (1) 

On the end of the block of N samples, the energy can be 
computed as 
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( 2 ) 



Figure 2. Goerlzel algorithm for DTMF signal detection 

3. NUMBER OF ANALYZED SAMPLES 

Proper selectioit of the number N of analyzed samples 
is very important [3,5]. The number should be as small as 
possible for a correct analysis of duration a signal (or 
break). At 8 kHz sampling rate and 40 ms signal duration, 
the number N would be 160. But decreasing the number 
of samples, signal to noise ratio (SNR) decreases too. 
Second problem is the degradation of receiver frequency 
resolution and icreasing of the error of center frequency of 
filter passbands, provided that theoretical coefficients like 
in equation (1) are used. Fig.3 presents the maximum of 
the frequency error defined as 

g ^fkiLzA.iooo/„ . (3) 

* ./* 


where - frequency computed for the given number M 
ff.- nominal DTMF frequency. 



Figure 3. Maximum frequency error for all DTMF 
frequencies 

At constant number N for all DTMF frequencies, the 
number N must be set to at least 205 (cf. Fig.3) . For 


iV=205 the maximum frequency error for all DTMF 
frequencies equals 1.3%, thus is tolerably small. 

To reduce the frequency error at smaller number M 
selection of diflerent numbers N depending on detected 
frequencies can be used. Error values for 80^:^85 
corresponding to the theoretical coefficients cos(2nk/N) 
computed according to equation (1) presents Table 1. 
Fig.4 also shows the DFT corresponding to digit "5" for 
different numbers N. 

Tabic 1. Errors for varying block sizes N 
chosen optimally for individual frequencies 



Figure 4. DFT of digit "5" for different numbers M 
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Number A^=80 is very interesting because the time of 
analysis amounts 10 ms at 8 kHz sampling rate or 20 ms 
at 4 kHz sampling rate, i.e. one fourth or a half of the 
maximum recognition time, respectively. From Fig.4 
follows that the magnitude of DFT for //=80 is much 
worse than for N=205, Energy levels between detection 
frequencies (770 Hz and 1336 Hz) and neighbourhood 
signals are 8 dB and 3 dB for A^=205 and A^=80 
respectively. Therefore using the constant length //=80 is 
not possible. For interesting frequency we can find new 
better spectnmi DFT, e.g. if A^=83 for 770 Hz frequency or 
A^=84 for 1336 Hz frequency. But this advantage can not 
be directly used because these frequencies are disturbed by 
high energies of DTMF frequencies in neighbourhood, if 
the optimal numbers of N from Table 1 are chosen for 
them. For the analysis of spectnim of DTMF signals 
energies for paricular A^s, e.g. 80^A^^85 (cf. Table 1) are 
summed and the analysis is provided for total energies. 


4. PROGRAM FOR I) I MF RECnCIVER 

The DTMF receiver program has been prepared in the 
assembler language of the TMS320C5x signal processor. 
The program for detection of DTMF signals starts with 
processor and analog interface initializiUion. After 
initialization, the program analyzes blocks of 85 samples 
according to equation (1). For each DTMF frequency, 
program saves six DFT samples for different numbers N 
in the range 80^^85. After taking samples the energies 
of DFT are computed and summed to get the overall 
energy given by 

85 

(4) 

A^=80 

where k is the index of the Alh DTMF frequency and 
is given by equation (2). 

Next, the two frequencies (from low and high 
frequency group) with the highest energies are found. 
Then, checked if these energies are above the treshold and 
if the twist complies with requirements. Ne.xt, the energies 
of strongest signals are compared to the energies of the 
rest of tones in each group. Then, we compute energies of 
second harmonics corresponding to the strongest signals. 
These energies should be smaller than the energies of the 
first harmonics. 

After all tests the program gives a digit which was 
recognized and goes to taking other samples from the next 
block. 


5. CONCLUSIONS 

We described an algoritm based on the proposed 
improved method for the detection of DTMF signals. 


Using a nonuniform Goertzel filter bank gives the 
posibility to reduce the number of blocks samples up to 
85. The time of usage of the signal processor TMS320C50 
for the mentioned algorithm is less than about 10%. 
Therefore, a simple processor may be used to receiving 
DTMF signals from multiple PCM channels. 
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ABSTRACT 

A wavelet software package named UvLWave has been 
developed by the Signal Theory Group, in the Uni¬ 
versity of Vigo, to provide a simple way to work with 
wavelets. So, the toolbox may be useful to make easier 
the understanding of theoretical concepts. Moreover, 
it provides an experimentation platform for wavelet 
applications. This paper describes the contents and 
main features of UvLWave, that has been implemented 
within the powerful Matlab environment and is Ireely 
distributed. 


1. INTRODUCTION 

The availability of software tools make possible for stu¬ 
dents and researchers to explore and test quickly the 
possibilities of new methods. This is very impor an 
when dealing with emergent techniques, like wavelets, 
in the digital signal processing area. Moreover, this al¬ 
lows the user to concentrate on the applications, since 
it provides the basic algorithms and utilities for experi¬ 
mentation and further wavelet based developments. 

Wavelets can be used to solve very different pro 
lems that appear in many areas of electrical engineer¬ 
ing, such as speech, audio and image coding, compu er 
graphics, communications, numerical analysis, statis 
tics, etc [1, 2]. The enthusiasm with which wavelets 
have been accepted in so many fields, makes interest¬ 
ing the development of a complete set of tools. With 
this aim, we present here a wavelet toolbox that ca,n be 
used as general package for research and educational 
projects. 

2. UVI-WAVE WAVELET TOOLBOX 

UvLWave provides a set of Matlab command line furic- 
tions and demonstrations to analyze and synthesize sig¬ 
nals by means of wavelets and wavelet packets. It also 

Tills work was partially supported by the University of Vigo. 


includes tools for the design and test of two-channel 
filter banks The different routines allow an exhaustive 
oeatmen. otwvaUt baaed lechmquee ,n 

a simple and powerful enviroment. 

2.1. IMPLEMENTATION 

We have used Matlab as implementation Pl^tform ^ 
tL tools described in the next section. Matlab 1.^ 
been chosen because it is well known for researchers 
L^neers and students, and the routines can run on 
many platforms with minimum changes. 

The current version of the toolbox includes 121 mat 
lab functions and 12 data files with sample signals and 

filters The Matlab code is compatible with versions p 

fo 4 O’.ld all the routines have been tested on Unix 

""“"^A^'the^Tun^ons offer on-line help, with descri^ 
tions of the algorithms and hints to use More, 

over, a complete reference manual . ; 

functions are indexed, cross-referenced, ; 

Dut/output arguments are explained in detail. To 
his/rate the usage and application of the 
tions the manual introduces some exarnples, togeth 
with an algorithm description if require . 

3. STRUCTURE OF THE TOOLBOX 

In the next sections, a short description of the main 
functions of the UvLWave Toolbox is presented. 

3.1. DISCRETE WAVELET TRANSFORM 

Caiculation and displayingof the Discrete Wavde^ 
form (DWT) and Inverse Discrete Wavelet Tra 
fIDWT) for unidimensional or bidimensional signa s 
131 Table 1 briefly references these functions. There is 
Li'restrictions on the signal length. For avoiding bor¬ 
der effects when reconstructing finite-length signa , 
periodic extension method is used [4]. Figure 1 s ows 
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the 2-stage wavelet transform of a square with an in¬ 
scribed cross image. 


Function 

Purpose 

wt 

ID Discrete Wavelet Transform 

iwt 

ID Inverse Wavelet Transform 

wt2d 

2D Discrete Wavelet Transform 

iwt2d 

2D Inverse Wavelet Transform 


Table 1: Discrete Wavelet Transform 



Figure 1: Wavelet decomposition with 2 levels. 


3.2. SCALE AND WAVELET FUNCTION 

Calculation and displaying of the discrete approxima¬ 
tion to the scale and wavelet functions. Their com¬ 
putation is based on the cascade algorithm [5]. Sarne 
manner, any basis function in the wavelet packet basis 
library can be computed. Table 2 lists the functions. 


Function 

Purpose 

wavelet 

wavepack 

Wavelet and Scale functions 

Wavelet Packet functions calculation 


Table 2: Continuous wavelet functions 


3.3. SCALOGRAM 

Computation of the Continous Wavelet Transform co- 
efTicients can be performed using the functions in table 
3. The representation of its modulus above a tirnc-scalo 
plane (scalogram) is a well suited tool for signal analy¬ 
sis. The Morlet wavelet is used as prototype function. 

Figure 2 presents the scalogram of a rectangular 
pulse obtained using scalog. 



Figure 2: Scalogram of a rectangular pulse. 


Function 

Purpose 

morletw 

Morlet Wavelet calculation 

scalog 

Scalogram computation 

srf 

3D plot of the scalogram 


Table 3: Continous Wavelet Transform 


3.4. MULTIRESOLUTION ANALYSIS 

The objective of these routines is to obtain a represen¬ 
tation of the signal at different resolutions. So, approx¬ 
imation and detail signals at different scales are com¬ 
puted. There are functions for both unidimensional 
and bidimensional signals, as table 4 presents. 


Function 

Purpose 

aprox 

ID approximation signals 

detail 

ID detail signals 

multires 

Complete ID multiresolution analysis 

mres2d 

2D approximation/detail image 

nssffb 

Non sub-sampled filter bank 

inssffb 

Inverse non sub-sampled filter bank 

nss2d 

2D non sub-sampled filter bank 

inss2d 

Inverse 2D non sub-sampled filter bank 


Table 4: Multiresolution analysis 


Other routines in the table implement the FIR fil¬ 
ter bank structure performing the wavelet transform, 
without decimation. It can bo mainly used for time or 
spatial singularity detection. They have been imple¬ 
mented for 1-D and 2-D signals too. 

Figure 3 displays the original and the approxima¬ 
tion signals (for 2^ to 2'’ scale) obtained with aprox. 
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Figure 3: Original and approximation signals. 


3.5. WAVELET PACKETS 

Discrete-time algorithms for the Wavelet Packet Trans¬ 
form are implemented, with the same capabilities as 
the Wavelet Transform. Two formats are allowed to 
encode the basis (i.e., the filter bank tree) for unidi¬ 
mensional transforms: natural and frequency order [6]. 
Some tools to manage the different formats have been 
included. Table 5 lists most of these routines, together 
with those of the next section. 

Moreover, the viewing utilities allow the user to plot 
the time-frequency plane tiling performed by a certain 
wavelet packet basis, or the corresponding filter bank 
tree scheme. 

3.6. BASIS SELECTION ALGORITHMS 

In addition to the wavelet packet transform for unidi¬ 
mensional or bidirnensional signals, different basis se¬ 
lection algorithms have been included, with additive 
[6, 7] and non-additive [8, 7] cost functionals. 

Basis selection algorithms for 1-D signals comprise 
pruning [7] and growth [9] types. Both of them have a 
different implementation according to the additiveness 
of the used cost function. For 2-D signals, only pruning 
algorithm with additive costs {best basis algorithm [7]) 
has been implemented. All these functions are listed in 
table 5. 

3.7. FILTER BANK DESIGN 

A set of algorithms that yield several filter families have 
been implemented. The selected design algorithms in¬ 
clude orthogonal [10, 5, 11, 12] and biorthogonal fam¬ 
ilies [5], as presented in table 6. Some functions to 
compute filter regularity estimates are also included. 


Function 

Purpose 

wpk 

Wavelet Packet Transform 

iwpk 

Inverse Wavelet Packet Transform 

wpk2d 

2D Wavelet Packet Transform 

iwpk2d 

2D Inverse Wavelet Packet Transform 

pruneadd 

Pruning algorithm for additive costs 

prunenon 

Pruning for non-additive costs 

growadd 

Growth algorithm for additive costs 

grownon 

Growth algorithm for non-additive costs 

prune2d 

Quadtree pruning for additive costs 

Ipenerg 

Energy with l^ norm 

shanent 

Shannon entropy 

logenerg 

Tog energy’ functional 

event 

Coifman-Wickerhauser entropy 

veaklp 

Weak 1^ norm 

emparea 

Compression area 
^ • 1 ^ 


cmpnum Compression number 


Table 5: Wavelet Packets 


Function 

Purpose 

vspline 

Spline biorthogonal filters 

daub 

Daubechies orthogonal filters 

symlets 

Least-asymmetric orthogonal filters 

maxflat 

Maximally flat orthogonal filters 

lemarie 

Battle-Lemarie orthogonal filters 

remezfIt 

Remez solution orthogonal filters 

tempreg 

Holder regularity temporal estimate 

specreg 

Regularity spectral estimate 

regdaub 

Holder regularity for Daubechies filters 


Table 6: Wavelet filters generation 


3.8. SUBBAND MANAGEMENT UTILITIES 

Utilities for locating, extracting or inserting any sub¬ 
band are provided. These functions allows to process 
separately the content of a single subband, if desired. 
All of them works with wavelet and wavelet packet 
transforms, and with any signal size. For the wavelet 
case, maxima extraction, minima deletion or local ex¬ 
trema extraction can be performed, too. Table 7 lists 
most of these utilities. 

3.9. DEMONSTRATION SCRIPTS 

In addition, some demos and test signals have been 
included, illustrating the main features and capabilities 
of UvLWave^ how the functions are called and some 
applications. All the demonstration functions can be 
accessed via the user friendly menu shown in figure 4. 

All the messages included have an educational goal, 
introducing short explanations of the main concepts 
shown in the demos. Figure 2 shows the scalogram of 
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Function 

bandsite 

bandext 

bandins 

bandmax 

elmin 

localext 

bandadj 

siteband 

extband 

insband 


Purpose _ 

WT Subband localization 
WT Subband extraction 
WT Subband insertion 
WT Subband maxima extraction 
WT Minima removal 
WT Local extrema extraction 
2D Subband normalization 
Wavelet Packet subband localization 
Wavelet Packet subband extraction 
Wavelet Packet subband insertion 


Table 7: Snbband management 


a rectangular pulse obtained during the demo. Besides 
these demos, some scripts help on 1-D basis formats 
and 2-D transform output format. 

3.10. AVAILABILITY 

Uvi.Wave has been written with an educational and 
research assistant goal. So, it is freely distributed, un¬ 
der GNU public license, through the Internet. It is 
available at the Communication Technologies Depart¬ 
ment ftp server (ftp.tsc.uvigo.es), and there is a WWW 
related page with information and links to the toolbox 
(http://www.tsc. uvigo.es/''waveIets/Uvi-.Wave.html). 

Around 600 copies of the latest version of Uvi^Wave 
have been downloaded by anonymous ftp. Further¬ 
more, about 100 people are participating in a mailing 
list for general discussions on wavelets, reporting about 
the toolbox performance, and help on using it. 



Figure 4: Menu for the main demo function. 

4. CONCLUSIONS 

A comprehensive and easy to use toolbox for under¬ 
standing wavelets and researching on their applications 


has been presented. It can be used in a wide range 
of platforms since it has been implemented in Matlab. 
The experience along the last two years has shown its 
usefulness in a large variety of contexts. Further devel¬ 
opments comprises Cosine Packets, additional wavelet 
filter types, speeding-up some functions by means of 
,mex routines, graphical interactive tools, etc. 
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Abstract —Using wavelet networks, it is possible to cap¬ 
ture the characteristics of non-linear dynamic systems in 
a multi-scale strategy. Starting from the coarsest approxi¬ 
mation (coarsest scale) we go step-wise to the finer scales. 
At each step the error signal or the residue of the system 
is modolod. This procedure is repeated until the resid¬ 
ual drops below some modeling error bound. The model¬ 
ing is carried out using compactly supported biorthogonal 
wavelets. By choosing appropriate wavelet basis, it is pos¬ 
sible to obtain a near optimal model. 

I. Introduction 

In system modeling, it is required to build a mathemat¬ 
ical model for an unknown system based on a finite set of 
observed data. This is the case in many applications such 
as channel equalization, plant identification, signal com¬ 
pression and so forth. The model uses the observed data 
to capture the features of the system, and when excited 
with a new set of input signals, it is expected to follow 
the system behavior closely. The fundamental issue is to 
keep the size of the model as small as possible. 

Many solution methods are proposed in the literature. 
Recent developments are the use of neural networks and 
fuzzy systems [1], [2], [3]. The dominant problem in all of 
these approachs is the excessive size of the model which 
makes the update algorithms complex and slow. 

In this paper we present networks that use wavelets 
as source of the non-linearity. The method uses the so- 
called multi-scale modeling. Owing to the existence of 
biorthogonal wavelet pairs, the weight vectors are deter¬ 
mined by an inner product technique. This makes the 
update algorithm simple and fast. 

The use of wavelet networks for non-linear system mod¬ 
eling is not a new concept, it has been discussed in [4], 
[5] and [0]. However, in these works, the multi-resolution 
aspect of the wavelet networks is not well considered. 
Here, on the other hand, we generate a more systematic 
modeling strateggy by making use of the extra behav¬ 
ior offered by the multi-resolution nature of the wavelet 
network. This is done through the so-called multi-scale 
modeling. In multi-scale modeling, the problem is sub¬ 
divided into several sub-problems, and the modeling is 
carried out starting from the coursest approximation and 


then proceeding to the finest scale. The advantage of this 
approach over the usual fine to course analysis is that at 
each finer scale the model approximates the system at in¬ 
creasing resolution. This is equivalent to first capturing 
the most general feature of the system and systematically 
proceeding to its detailed characteristics. 

II. Problem Definition 

Let be given a collection of observed inputs Uj = {ut} 
and the corresponding collection of outputs y< = {yi}, 
i = 0,1, 2 ,..., of unknown non-linear discrete-time mul¬ 
tivariable dynamical system of the form 

y«+i =f(ut,y,), (1) 

where u< is an n-dimensional input vector and an m- 
dimensional output vector; 

U, = [ «, U|_l Ut-2 ... ] , 

y< = [ y«-i y «-2 ... yt-m+i ]. 

Assume that u< and y^ are confined within a compact 
region C of an arbitrary shape. We intend to construct a 
wavelet network model for (1) over C using the so-called 
multi-scale strategy to meet a prescribed modeling error 
bound. For the sake of convenience, we define an / = 
(n -h m)-dimensional vector xt = [ u< y< such that 
(1) becomes y^+j = f(xt). Note that for a static system 

Xt = Uf. 

It could be shown that, [7], the non-linear modeling 
problem of (1) is well-defined, if it is such that f(.) E 
We consider systems under ths category. Note 
that the above condition is easily met if f (.) is continuous 
over the compact region C. 

III. Multi-scale Modeling 

The multi-scale modeling is inherent in the wavelet net¬ 
work, it utilizes the multi-resolution nature of signal ex¬ 
pansion using wavelets. Multi-resolution analysis is well 
discussed in many papers and standard wavelet texts. In 
nearly all applications, the analysis is performed from fine 
to course resolution (scale). In system modeling, however, 
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where we have global and local features to be captured 
by the model, it is evident that a more natural way of 
learning would be to go from course to fine scale. In do¬ 
ing so, we can gradually localize global feature. This is 
equivalent to the well known process of zooming-in. 

The advantage of the global/local approach is that, at 
each finer scale, we model the residual of the previous 
Bcales.^ We proceed with the course to fine approach, 
each time modeling the residual, until the modeling error 
falls below a certain error bound. If large modeling er¬ 
ror bound is tollerable, then we only have to capture the 
global features and ignore the local ones. 

A. The Modeling 

Let (p and represent the analysis and synthesis 
biorthogonal wavelet pairs, respectively. We want to 
model the system transfer = f(x() of (1) using the 
linear combination of the scaled and translated version of 
the synthesis mother wavelet V’(xt)^ as 

= ( 2 ) 

where — k^), and the m-dimensional 

coefficient vector cy,ib^ is obtained from^ 

(3) 

with the scaling factor j = 0,1,2,... and the translation 
factor 6 {^i, ^ 2 , • •., n,- denoting the number of 

wavelet nodes at scale j. 

A.l More about ip{x) 

Given an excitation signal x(l) and a wavelet basis i>, 
^{x) is a reordered version of part of the samples of if) 
and, therefore, can be expressed .is 


i>ix) = V>P«, (4) 

where Pg represents a sort of “permutation” matrix. It 
may niether be full rank nor square. If it is full rank, 
then we say ®(t) visits all the samples of ip, and we can 
perfectly reconstruct V- from ip{x). On the other hand, if 
P, is not full rank, then x{t) does not visit all the samples 
of Ip. However, if the missing samples constitute a minor 
portion of the total energy, we can still recover ip with a 
resonably good precition. 


A.2 Determining the biorthogonal pair for ip{x) 

Consider the biorthogonal wavelet pair (<p,'ip). From 
biorthogonality, we know that the inner product = 
/. this means that (p and ip form a biorthogonal set. 


^To be defined soon, in the next sub section 
^Note that we use a different excitation signal Xt 
wavelet. This will be clear soon. 


for the analysis 


Assume that we excite ^ with a signal x(<) such that, 
using (4), we obtain 'ip{x) = 'ipPx. If Px is full rank, then 
we can find Px such that PxPx = us now gen¬ 

erate a secondary excitation signal i(t) that excites the 
analysis wavelet nodes such that (p(x) = (pP^. Then we 
have (p{x)ip{x)'^ = I, This means that the biorthogonal 
wavelet pair (p and ^ remain that way even after they 
are excited with the signals i(i) and x{t) respectively. 
Which means, we can use (2) to approximate f, where the 
weight coeficients are given by (3), and ic^ is such that, 
with ipixt) = <pPx, and ^(xt) = ^Px,, Px,Pj, = 

P. The wavelet network 

The wavelet network, as shown in Fig.l, is a three layer 
feed forward network. The first layer contains the wavelet 
nodes, the second layer the linear nodes and the third 
layer the summation nodes. To give a clearer picture, the 
detail of the wavelet network at scale j is presented in 
Fig.2. 



Fig. 1. The structure of the wavelet network 


T 1“ T T 



Fig. 2. The wavelet network at scale j 


B.l The wavelet nodes 

The JJZ activation function of the 

wavelet nodes in the first layer are generated by dilating 

^This relation is properly shown in proposition III-B.l 
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and translating a predetermined mother wavelet 

With Xp^t representing the row of xt, xl^{xt) is defined 

as 

l-i 

^(x<)= 

p=0 

and its biorthogonal complement ^(x<) as 

/-I 

<p{^t )=n (6) 

p=0 

Proposition IJI-B.l: The wavelet functions i>{xt) 
and y?(xe) defined in (5) and (6) form a biorthogonal 
wavelet pair in if 

1, Xp^t, the row of Xt, visits all the samples of V' 
for a//p = 0,1,1, and 

2. The wavelet families 

Sj-= ['I'o(x,) ^i(x,) ... ^i(x,)]'^ 

and 

Aj = [$o(x«) $i(x<) ... $i(x«)r 

are such that 

SjAi = I (7) 

forj = 0,1 ,..where ^j{xt) and ^j{xt) are defined 
as 

^i(xt) = [V'i.fcifxt) ... (8) 

♦>(xt) = [v>y,fc,(xt) Vj.fcj(x,) ... vy,«:„(xt)]’^ (9) 

Note that condition (7) in the proposition implies that 
for j = 0,1, 2,... the wavelet famly pair Sj and Aj form 
a biorthogonal basis in a subspace Vj C V, where V is 
the space in which the signal f(xt) is defined. 

Proof: We avoid detailed mathematics and try to 
outline the proof on an abstract level. Let V’(t) and 9 ?(t) 
represent UZ^ TZ multivariable biorthogonal wavelet 
pair. Note that ^^(t) and 9 ?(t) could be be obtained using 
(5) and (6), respectively. Then using (4), ip{xi) and v?(x<) 
can be expressed as 

V^(xt) = i^{t)Pxi 

<p{xt) = y’(t)Px, 

where Px,) Pxi sire permutation operators in appropriate 
spaces. If Xp^t visits all the samples of for all p = 
0,1,..., / “ 1, then Pxi is full rank, and we can find Px< 
such that Px^Px, = With this choice of Px,, we have 

V>(x,)y’((xf)^ = V’(t)fx,^x,v’(t) 

= mvitf = I ( 10 ) 


B.2 The linear node 

The second layer in the wavelet network is a linear 
node, it performs the computation"* 

~ * = 0,1,..., m — 1 (11) 

/* 

Or, letting yj = [yo,; yi,> •.. (see Fig. 2), we 

can re-write the last equation as 

yi = C';'l'j(^0. (12) 

where the m x nj weight matrix Cj is 




C0,J,fc3 


II 






Cm~l,j,Ar3 • 



and ^y(x<) is the tij- wavelet vector given in (8). 


B. 3 The summation node 

The transfer function of the third and final layer is 
given by 

yi = (14) 

J 

where y,-, t = 0,1,..., m — 1 is the entry of 
which in turn is the model approximation for yt^.!* Thus, 
combining (12) and (14), we generate the transfer func¬ 
tion of the wavelet network model for (1). 

J 

y,+,=f(x0 = X^Q'F,(x0. (15) 

i=0 

The wavelet network described by (15) is a univer¬ 
sal approximator. This is the consequence of the so- 
called multi-resolution behavior of wavelet expansion of 
signals, which is discussed in [8] and stated in the follow¬ 
ing lemma. 

Lemma IITB.I: Any non-linear system of (1) can be 
approximated arbitrarily close by (15) for some large 
enough scale J, i.e. Given an arbitrary error bound e, 
there exists a scale J such that 

Ilf(x0-f(x0ll<e. 

C. The procedure 

Let s input-output data points {xf} and {yt+i}, i = 
0,1,2,..., s -1 of (1) be represented by X and Y, respec¬ 
tively. X and Y are lx s and mxs matrices, respectively. 
Let laso y^ represent the system approximation at scale j 

^For the notations refere to Fig.2 
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such that Y = Ylj yj‘ Note that yj is also an m x s ma¬ 
trix. Then, the multi-scale modeling procedure at scale j 
is obtained by decomposing (1) into (refer to Fig. 3) 

^yj-i = Yj+Ay^. ( 16 ) 

= Q>F^(X) + Ay,. (17) 

yj = Cj^j{X.) is the approximation of Ay^^j (the 
residue at the previous scale), and Ayy is the residiie at 
the current scale. Cj is a m x rij weiglit matrix defined 
in (13) and ^i(X) = [^j(xo) ^i(xi) ... ^j(x,„i)] is an 
rij X s matrix, nj, is the number of translation wavelet 
nodes at scale j. 

For y = 0, Ayy-i = Ay_i = y. Thus, at the coursest 
scale we model the system output and for all other scales 
(i > 0) we model the system residue Ayj_i, 

D, The updating algorithm 

In (17), each row of Cj^j(X) is the linear combination 
of the rows of ^j(X). The weight matrix Cj can be de¬ 
termined using an LMS type of algorithm. On the other 
hand, if we start from a biorthogonal wavelet pair (-0, ^), 
such that = 7, the identity, we can use simple inner 
product rule to determine the weight vector as discussed 
in section III-A. 

Consider the block diagram representation of the 
wavelet network structure show in Fig.3. At the start, 



Fig. 3. Modeling at scale j 

we assume that the initial value of the output Cj^j{X.) 
at scale-y is zero. From (17), this means that the initial 
value of the weight matrix is zero, and that of the residue 
is Ayj = Ayj__j. Let and Ayj°^ denote these initial 
values, respectively. Then, we recursively calculate the 
weights and the resudues Ayj."\ n = 1,2,..using 
the two relations, 

+ and (18) 

Ay5."+') = Ay^_j(19) 


where ^j(X) = [^i(xo) ^i(xi) ... ^j(x,-i)] and X is 
such that (X)$j (X)^ = 7. Note that if all the rows Xp 
of X, p = 1, 2,..., / visit all the samples of the wavelet 
function 0, then the above itteration is not necessary as 
(AyJ"\ $, (X)) = 0 for all n > 0. 
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